September 29, 2004

Splitters and Lumpers - Then and Now

As Mark has pointed out, the division between "lumpers" and "splitters" goes back a long way, and in their time, Benjamin Barton Smith and Thomas Jefferson represented opposing camps. But there is a big difference between the situation in 1804 and the situation now, two centuries later. In Jefferson and Barton's day, very little was known about historical linguistics, either about how languages change or about the problems of determining how languages are related to each other. There had been isolated sprouts of good ideas - both morphological evidence and sound laws had been used as arguments for linguistic relationship - but scientific historical linguistics had yet to develop. As a result, reasonable people could hold different views. Jefferson's position resonates with us today both because he turned out to be right and because his argument is a bit more nuanced, but at the time one couldn't say definitively that he was right on the basis of a deep understanding of the problem.

The situation is quite different today. We now have a much better understanding of the problem of determining whether and how languages are related, and we know quite a bit about how languages change. We've also had a good bit of experience, from which we have learned about what works and what doesn't. Among the things we've learned:

  • The probability of chance resemblance among languages is sufficiently high that we must make an effort to determine that the similarities we notice are not due to chance. The first discussion of the mathematics of linguistic comparison did not appear until 1819, four years after Smith's death, and it was wrong.
  • Some similarities among languages are due to the fact that the relationship between sound and meaning is not entirely arbitrary. These similarities are therefore not relevant as evidence of linguistic relationship.
  • Some grammatical properties of languages are either universal or admit of only a few possibilities, rendering them useless as evidence of linguistic relationship.
  • Sound change is regular, allowing us to establish sound correspondences between related languages.
  • Sound correspondences greatly reduce the number of degrees of freedom and so greatly reduce the probability of chance relationship.
  • Multiple sets of sound correspondences can be used to distinguish loans from inherited words.
  • Languages can borrow large amounts of vocabulary, including basic vocabulary.
  • In some circumstances, languages borrow morphology as well as vocabulary.

What we've learned has had two consequences. On the one hand, it has made us much more skeptical about proposed linguistic relationships. We know that we need to determine whether similarities are due to chance, that we need to exclude from consideration certain kinds of words, such as mama and papa words and onomatopoeia, and that we have to look very carefully at the possibility that we are dealing with borrowing rather than common descent. On the other hand, we've learned how to reconstruct unattested proto-languages from their attested descendants and how to work out the family tree of related languages. We've also learned a lot about the mechanisms of linguistic change.

There are plenty of things we don't know, but we know so much more now than in Jefferson and Smith's time that some ideas that were reasonable then are not reasonable now. One of these is the idea that a small number of vaguely perceived similarities between languages constitutes evidence of common descent. One sometimes sees the difference between splitters and lumpers presented as one of taste and personality. That isn't accurate. There may be such differences, but the disputes between mainstream historical linguists and "long-rangers" like Joseph Greenberg and Merritt Ruhlen are about methodology, namely whether historical relationships must be established by the comparative method or whether superficial lexical comparison is a valid alternative. When one looks at the evidence, the outcome is clear. The mathematics of probability shows that superficial lexical comparison fails to provide evidence that similarities are not due to chance. Even if similarities are not likely to be due to chance, superficial lexical comparison is unable to distinguish between borrowing and common descent.

These conclusions derive from our understanding of language change and of the problem of determining linguistic relationship; they are supported by the history of historical linguistics. Experience has shown that superficial lexical comparison leads to results that subsequently prove to be incorrect. In 1901 in his book Die Sprachwissenschaft Georg von der Gabelentz made this point eloquently in a passage (pp. 164-168) in which he pointed out that Franz Bopp, deservedly famous for his work on Indo-European morphology, had gone astray when he made claims about linguistic relationships without following the comparative method. Here is the German original followed by my translation. (Lyle Campbell and I have discussed this in our paper Indo-European Practice and Historical Methodology [PDF file].)

Es ist schrecklich verfürerisch in der Sprachenwelt umherzuschwärmen, drauf los Vocabeln zu vergleichen und dann die Wissenschaft mit einer Reihe neu entdeckter Verwandschaften zu beglücken. Es kommen auch schrecklich viele Dummheiten dabei heraus; denn allerwaerts sind unmethodische Köpfe die vordringlichsten Entdecker. Wer mit einem guten Wortgedächtnisse begabt ein paar Dutzend Sprachen verschiedener Erdtheile durchgenommen hat, - studirt braucht er sie gar nicht zu haben, - der findet überall Anklänge. Und wenn er sie aufzeichnet, ihnen nachgeht, verstaendig ausprobirt, ob sich die Anzeichen bewähren: so thut er nur was recht ist. Allein dazu gehört folgerichrichtiges Denken, und wo das nicht von Hause aus fehlt, da kommt es gern im Taumel der Entdeckungslust abhanden. So ging es, wie wir sahen, dem grossen Bopp, da er es versuchte, kaukasische und malaische Sprachen dem indogermanischen Verwandtschaftskreise zu zuweisen. Das Schicksal hatte es merkwürdig gefügt. Es war, als hätte er die Richtigkeit seiner Grundsätze doppelt beweisen sollen, erst positiv durch sein grossartiges Hauptwerk, das auf ihnen beruht, - dann negative, indem er zu Schaden kam, sobald er ihnen untreu wurde... Die Sprachen sind verschieden, denn die Lautentwickelung hat verschiedene Wege eingeschlagen. Hüben und drüben aber ist sie ihre Wege folgerichtig gegangen; darum herrscht in den Verschiedenheiten Ordnung, nicht Willkür. Sprachvergleichung ohne Lautvergleichung ist gedankenlose Spielerei.
It is terribly seductive to roam the world of languages comparing words from them at random and then to bestow upon scholarship a series of newly discovered relationships. Very many stupidities also result from this; for the most urgent discoverers have unmethodical minds. He who, endowed with a good memory for words, has gone through a couple of dozen languages from different parts of the Earth, - he need not at all have studied them -, finds familiar forms everywhere. And if he records them, investigates them, tests intelligently whether the indications pan out, he does only what is right. Only logically correct thought belongs here, and where it is not absent from the outset then he gladly gets lost in the giddiness of the mania of discovery. Thus it went, as we saw, with the great Bopp, when he sought to assign Caucasian and Malayan languages to the Indo-European language family. Fortune had decreed him a curious fate. It was, to have to prove the correctness of his principles twice, first positively through his magnificent main work, which is based on them, then, negatively, by coming to grief as soon as he was unfaithful to them ... Languages are different because sound change has taken different paths. But it has gone its way consistently hither and thither; therefore Order reigns in differentiation, not Chaos. Language comparison without comparison of sounds is irresponsible game-playing.

The final nail in the coffin for superficial lexical comparison is that it has proven barren. When linguistic relationships are established by the comparative method, the evidence that there is a relationship is the beginning, not the end. Historical linguists then move on to reconstruct the proto-language, work out the family tree, and figure out what changes took place and often how and why. Superficial lexical comparison yields no such fruit.

In 1804 it wasn't crazy to be a lumper like Benjamin Barton Smith, but it is today because over the last two centuries we have come to know better. The disagreement between Jefferson and Smith may have been largely one of taste, but the disagreement today between mainstream historical linguists and "long rangers" like Greenberg and Ruhlen is not; it's a disagreement between science and quackery.

