March 16, 2004

Nicholas Wade on Gray and Atkinson

Today's New York Times contains a piece by Nicholas Wade based on the paper by Gray and Atkinson on the dating of the divergence of Indo-European that both I and Mark Liberman commented on a while back. It doesn't add anything new to our previous discussion, so I'll defer further comment on Gray and Atkinson and more generally the problem of subgrouping and dating to another occasion. What I'd like to comment on now is a much larger confusion that pervades the piece.

Wade begins with a comment on how great it would be if we could reconstruct the family tree of all human languages. He then writes:

Yet in the view of many historical linguists, the chances of drawing up such a tree are virtually nil and those who suppose otherwise are chasing a tiresome delusion.
Languages change so fast, the linguists point out, that their genealogies can be traced back only a few thousand years at best before the signal dissolves completely into noise: witness how hard Chaucer is to read just 600 years later.
But the linguists' problem has recently attracted a new group of researchers who are more hopeful of success. They are biologists who have developed sophisticated mathematical tools for drawing up family trees of genes and species. Because the same problems crop up in both gene trees and language trees, the biologists are confident that their tools will work with languages, too.
This shows a fundamental misunderstanding of what is at issue.

There are two aspects to classifying languages. One is showing that they are related at all. We don't know a priori that human languages are all related to each other. In fact, if you include the signed languages, we know for certain that they are not. But for the oral languages, we just don't know. The other aspect consists of determining how languages are related, once you know that they are. This is known as subgrouping since it consists of determining what the subgroups of the language family are, and in turn what the subgroupsof the subgroups are, and so forth, ultimately resulting in a family tree. Each branch of the family tree represents the divergence of two or more languages, an event that took place at some time in the past. Ideally, we'd like to be able to assign dates to those events.

The problem that Wade refers to in the passages cited is the first problem, that of establishing that all (oral) languages are related. The mainstream view of historical linguists is that this has yet to be demonstrated and probably never will be, even though it may be true. To see why, let's review what is involved in showing that languages are related. We need to show that the languages exhibit similarities that can only be explained by the hypothesis of common descent. To do this, we need to show:

  • That the similarities are not attributable to innate linguistic universals;
  • That the similarities are unlikely to be due to chance;
  • That the similarities are unlikely to be due to diffusion
The first point is easy. If there are properties common to all human languages that are due to the way our minds and bodies work or to the way signal channels work, they are explained by a hypothesis other than common descent and therefore provide no evidence for common descent. The second point reflects the fact that it is fairly easy to find meaningless random similarities between languages, especially if you allow yourself to go fishing among a lot of languages. The third point reflects the fact that in addition to innate univerals and common descent, languages may be similar because they have borrowed from each other. The fact that English has words such as zen and samurai which closely resemble Japanese words is not evidence that English and Japanese are genetically related because we know that these words are fairly recent loans.

Thus far, claims of very large-scale genetic relationship, such as the Amerind, Indo-Pacific, Eurasiatic, Nostratic, and Vasco-Dene language families, or the even stronger claim that Proto-World has been reconstructed, have not been generally accepted. One reason for this is not, strictly speaking, methodological. A good deal of this work is based on extremely poor data. Some of the leading proponents of such hypotheses have been found to have been incredibly slip-shod in their handling of the data. The phonetic form of words cited is often wrong, or the meaning is incorrect. Sometimes the cited words don't come from the language they are supposed to come from. Very frequently, words are given incorrect or unjustifiable morphological analyses. In theory, of course, large-scale comparisons can be done competantly, and some are, but a good deal of this kind of work has been invalidated on the grounds of inadequate data handling.

A second reason for skepticism about the cases that have been made thus far is that they don't pass the statistical test. By and large, they involve similarities so few and vague overall that we are not persuaded that they are not attributable to chance. We're also skeptical about the possibility of more persuasive arguments being made in the future because, as Wade mentions, languages change sufficiently fast over time that the "signal" of relationship at great time depth is likely to be very weak and overridden by the "noise". It's important to understand that nobody is claiming that there is an absolute limit. We can't say: "languages change at such a rate that the remotest relationship that could be demonstrated is X thousand years ago. Any claim beyond that is bunk a priori.". We're just saying that the only satisfactorily demonstrated relationships go back no more than about 10,000 years, and that since human language probably goes back at least 50,000 years, there is a large gap between the date of Proto-World, if there was such a language, and anything thus far demonstrated. It is possible that by chance some evidence may have survived long enough; if so, we'd love to see it. But until somebody provides convincing evidence of genetic relationship at great time depth, there's no case.

The third reason for skepticism is that one of the significant developments of research over the past twenty years or so (much of it by our own Sally Thomason) is that we know a lot more about language contact than we used to. In particular, we've learned that massive borrowing does occur, that grammatical structures can be borrowed, and that borrowing of basic vocabulary is more common than we thought. This means that we have to be more concerned than we used to be about the possibility that non-chance similarities between languages are due to borrowing rather than common descent. This problem because more severe the farther back we go both because the total amount of evidence becomes smaller and because the farther back we go the less likely we are to know anything about the external history of the languages, that is, who was in contact with whom and what the nature of the contact situation was. As a result, at great time depth we are in a poor position to distinguish genetic affiliation from diffusion. Of course, borrowing can also skew subrouping, so our improved knowledge of language contact phenomena poses a problem there too.

The problems that Wade's opening alludes to are the problems of determining whether languages are related at all. The problem addressed by Gray and Atkinson and by related work, is the other problem, that of subgrouping and dating. Their method presupposes not only that we know that the languages in question are related, but that we have reconstructed the details of that relationship so that we can determine which words in the daughter languages are cognate. So, even if Gray and Atkinson's approach works, it will only provide a new and better means of subgrouping and dating languages known to be related. It won't help in the slightest to demonstrate relationship in the first place.

Posted by Bill Poser at March 16, 2004 01:39 PM