June 02, 2007

Dediu and Ladd again: correlations and mechanisms

This is a guest post by D. Robert ("Bob") Ladd, co-author with Dan Dediu of "Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and Microcephalin", PNAS, Published online before print May 30, 2007. As Bob explains, "To avoid confusion with the referents of I and we, I've put this over my own signature only, but it comes from both of us", i.e. both of them.

Mark's discussion of the statistical reasoning in Dan Dediu's and my PNAS paper on the possible link between population genetics and language typology is interesting and useful. For those readers who want to know more about the statistical techniques we used, Dan has put up an addition to our "further information" site, specifically about the stats.

The main point we'd like to emphasize here, since this has been a matter of some misunderstanding (though not on Mark's part), is that we did NOT go trawling through a giant database of gene/language correlations and look for a good one. We were looking specifically to see if the hypothesized correlations between tone and the two genes under discussion stood out from the great mass of gene/language correlations, and they did.

That said, Mark's commentary raises three points that we'd like to amplify a bit.

(1) It's not clear what our statistical null hypothesis should be or how to control for coincidence; as he says, "even in a distribution created by completely random effects ... SOMETHING has to be way out in the tail." We did what we could to rule out coincidence as an explanation for our findings, but in the end it could still just be coincidence.

(2) Consequently, we've gone about as far as we can go with statistics; the only real confirmation that we are onto something will now come from experimental work demonstrating the existence of the hypothesized genetically-induced "cognitive bias" in individuals, followed by studies clarifying the neurological basis of the bias. As Daniel Nettle says in his Commentary on the print version of our paper (appearing soon), our work is really hypothesis-generating rather than hypothesis-testing.
We are now generating precise hypotheses about the nature of the bias, and hope to start testing them soon.

(3) Up to a point, Mark is right that our original hypothesis was not much more than a hunch based on human pattern recognition abilities. (Several referees said similar things on our way to publication.) Specifically, this project began in earnest when I pattern-recognized a connection between the Lahn group's gene maps and my mental map of the distribution of tone languages. But I have been thinking for some time about the cognitive status of tone, paralanguage and other non-sequential linguistic features (go to http://www.ling.ed.ac.uk/~bob/lhulme.html for more detail); Dan's PhD research, starting from entirely different premises based on evolutionary genetics and the study of human prehistory, was looking for evidence of gene-language correlations of exactly the sort we've documented; and of course, we knew that ASPM and Microcephalin are involved in brain development. So if it was a hunch, it was a reasonably well-grounded hunch. Now, it's certainly true, as Mark says, that our geographical correlations would mean more if they had proceeded from some experimental demonstration of some sort of genetically linked, language-related, cognitive/behavioral/perceptual difference. But given the widespread assumption (rooted in the Boasian tradition, but with a significant contemporary boost from Chomsky) that the human language faculty is absolutely uniform across the species, it's very unlikely that we would have been able to get funding to look for such a difference first. So we started by doing something we could do on our own without such support, namely testing the apparent correlation. Having done that, we hope we are now in a better position to apply for funding for the expensive part of the research. This might seem backwards, but it's a pretty common way of doing genetic mapping studies: start from your phenotype, use correlational studies to identify plausibly associated genetic markers, and then try to understand experimentally what the genetic markers actually do.

If we manage to make progress on that front, we will certainly let everyone know.

Bob Ladd


[Guest post by Bob Ladd]

Posted by Mark Liberman at June 2, 2007 11:45 AM