November 24, 2007

Does Cosma deserve a refund?

Cosma Shalizi tears William Saletan up, blow-torches the shreds, and jumps up and down on the ashes. And then he starts in on Saletan's employers:

The editors of Slate have just demonstrated that they either cannot or will not do their job. Someone who reads a story there now must ask themselves "Is this appearing here because the editors are incapable of recognizing that it's worthless? Is this appearing here because the editors want to make propaganda, to manipulate me into believing something, truth be damned? Is this appearing here because the editors owed someone a favor, or wanted to get into someone's pants, or wanted to acquire a reputation for being edgy and contrarian, truth be damned?"

This is a great rant, and eminently well deserved -- but it seems to me that we all ought to ask these same three qustions about everything we read. And all too often, the answers are going to be "yes, the editors are quantitatively illiterate", "yes, the editors have an ideological agenda", and "yes, the editors were motivated in part by personal affinities and by the desire to pander to their readers or to shock them".

It's still worthwhile to get angry about egregious examples. Writers and editors at places like Slate certainly care about their reputations, which means that they need to pretend to care about the truth, whether or not it actually ranks very high among their goals. A healthy intellectual ecosystem requires that mistakes sometimes be caught, or bullshit would reign supreme. And journalistic mistakes about science and mathematics -- unlike mistakes about geography, politics and commerce -- have traditionally (i.e. in pre-blog days) been free of consequences, since the people who know better could only mutter into their oatmeal.

Linguists are particularly sensitive to this problem, since we've failed so badly in our duty to educate contemporary intellectuals -- including writers and editors -- about syntax and phonology and so on. But the statisticians have done even worse. William Saletan's Slate series on racial differences in IQ should remind us that as a whole, our society is as willfully ignorant of basic statistical concepts as the Pirahã are of counting.

And just as people are confused by the ordinary-language meaning of terms like "sentence" and "vowel" and "modifier", so they often mistakenly think they understand terms like "correlation" and "factor" and "heritability" because these words have ordinary-language counterparts with vaguely analogous meanings. This is like believing that 7 means "small pile" and 25 means "big pile" -- sometimes the translation works, but often it doesn't. And reasoning based on these false translational equivalences can lead almost anywhere.

How about you? This is an important debate, and as a citizen of the world, you ought to try to understand the arguments for yourself, rather than just supporting the "expert" with the most evocative anecdotes, the best slogans, or the biggest bullhorn. The required mathematics is not very complex or difficult, and Cosma recently posted two long discussions of the relevant issues: "Yet More on the Heritability and Malleability of IQ" (9/27/2007), and "g, a Statistical Myth" (10/18/2007), which are free and accessible.

Cosma begins the first post (well, after some pro forma moaning about how depressing the whole thing is) by saying

I am going to assume that you know what "variance" and "correlation" are, but not too much else.

So to start with, you should ask yourself whether you can define and calculate the variance of a set of numbers, or the correlation between two sequences of numbers. If not, then read the (linked) wikipedia articles -- and spend a little time playing with the concepts in the context of an interactive program like R. Once you've paid that entry fee, read Cosma's posts. (It's more fun that you might think -- I especially recommend the discussion of the heritability of zip codes, and you could go back and read the prequel about the heritability of accent.) And then go through William Saletan's articles, and decide for yourself what they mean about the abilities and motivations of the writer and his editors.

Too big a price to pay? OK. But watch out for that river trader who's telling you about how big a pile of brazil nuts he wants for how many pieces of cloth.

[Cosma replies:

I am not sure that "we all ought to ask these same three qustions about everything we read" isn't an impossible (and questionable) ideal. Consider the difficulties of actually answering those questions in any particular case; they will often be substantial. And yet if the media ecology is functioning acceptably, the answers will either be "no", or perhaps better "not so much as to matter", and so the cost/benefit ratio is pretty skewed.

Well, perhaps I should have said "...keep these three questions in the background while processing everything we read", or something like that. Institutions have ideological biases and commitments, and so do individuals, including thee and me; we all have limitations of knowledge and skill; editors and journalists, like the rest of us, make choices in part based on personal connections and affinities or enmities.

Everything that we read is influenced, often significantly, by these factors. That doesn't mean that it's all crap -- it's not even true (in my opinion) that there's a strong negative correlation between quality and degree of influence. Ideological commitment can provide the motivation for difficult work with interesting results; ignorance can occasionally protect against the falsehoods that "everybody knows", or promote creativity by forcing someone to develop an unusual route to a solution. And personal networks are the fabric of cultural evolution, after all.

That's not to say that bigotry, ignorance and favoritism are always or even usually good things, or that truth is not the paramount virtue. But the problem with Saletan's series of articles, it seems to me, was a different one, namely irresponsibility. He used his powerful voice as Slate's "national correspondent" to address a sensitive and important issue that he doesn't understand very well, and to promote a point of view that has already done a lot of damage, and is likely to do more.]

[Update 11/27/2007 --

Cosma warned me that this discussion is easier to get into than to get out of, but I didn't listen.

I'm interested in the rhetoric of public discussion of science, and especially the rhetorical interpretation of statistical concepts; and I'm also interested in what you might call the ecology of journalism, especially as it influences public discussion of science. All this led me to link (above) to Cosma's rant against William Saletan's recent series on race and IQ. My basic observation was that Cosma has unrealistically high expectations about the relationship among journalism, expertise and the pursuit of truth. In passing, I observed that Cosma's earlier posts on heritability and g are an accessible tutorial on a important and often-misunderstood set of topics, and I urged readers to consider reading them.

This in turn has led me to P-ter at Gene Expression to take me to task, under the title "Linguist: I can use R, you can't. Thus, your motives are questionable. QED." The post says things like "Dr. Liberman assumes that Cosma concludes that heritability estimates are worthless. This is not the case."

A modest point in self-defense: I don't (and didn't) interpret Cosma's posts to mean that "heritability" and "g" are worthless concepts (though I think that they are easy to misuse individually, and often toxic in combination). All that I meant to say about these concepts is that the scientific versions are not at all the same as the ordinary-language versions, and that as a result, much of the discussion of the heritability of intelligence is confused and ill-founded. I'm well aware that simple mathematical models are often of great scientific and technological value. I spend a good deal of my time trying to persuade people that this is true, and teaching them how to put that belief into effect.

But there's a difference between using a map and thinking that it's equivalent in every way to the territory it describes. The only way to avoid serious conceptual errors is to understand the tools you're using, and the material you're using them on.

If you're going to debate the price of eggs, you should be able to count and to multiply. Of course, you can count and multiply, and still be completely out in left field on the price of eggs. But if you can't count and multiply, then you shouldn't waste our time with your arguments about why the conventional wisdom on egg-pricing is all wrong. If you insist on doing so anyhow, you'll forgive us for wondering why.

Similarly, if you're going to debate the heritability of g, you should understand variances (and the problems of estimating how to divide them up) and correlations (and the meaning of the results of factor analysis of positively-correlated variables). That's because heritability is a ratio of estimated variances; and g is a construct emerging from performing factor analysis on certain kinds of positively-correlated test results. If you don't understand those things, then you can't understand the literature on the heritability of intelligence, and you shouldn't waste our time with your arguments about why the conventional wisdom on that subject is all wrong.

The point is not that simple statistical models are useless, or that people who can't use R are barred from discussing them. What I actually said was that this is an important topic, and therefore you should want to learn about it, so as to come to an informed conclusion; and that Cosma's two long posts on the subject are fairly accessible -- except that the price of admission is to understand the meaning of "variance"  and "correlation".

The definitions of those terms are simple -- variance is the mean squared difference from the mean, and correlation  is the inner product of mean-corrected, length-normalized vectors. The equations are even simpler, as you can learn from the wikipedia entries. But there's a catch -- you probably can't assimilate these simple definitions and equations unless you already know enough mathematics that you already know these definitions and equations. As a result, if you don't already know about variances and correlations -- or if you once sort of knew, but have more or less forgotten -- there's a problem.

So I suggested a way out. At least for someone like me, a good way to understand new concepts is to play around with them in a practical setting. And luckily there's free software that makes this pretty easy -- a lot easier than it used to be in the days of pencil and paper, or even electronic calculators -- easy enough that for simple concepts like variance and correlation, you can probably demystify them for yourself, from a standing start, in a couple of hours. That's where programs like R come in.

This was all pretty compressed, I admit, and easy to misunderstand. However, my point was not to exclude anyone from the conversation, but rather to suggest how they can find a way in. If, after understanding the arguments on both sides, they agree with Saletan's conclusions, then we can argue about it. But if they don't understand what they're talking about, what's the point of arguing?

And just so that we're clear on what the issues are here, Josh Marshall reminds us of Saletan's basic premise, which is that that the genetic mental inferiority of Africans is as well established as the theory of evolution is, and that well-meaning liberals who try to deny the genetic mental inferiority of Africans are pitting their faith against scientific fact, just like the well-meaning believers who don't want to accept evolution. Here's Saletan, as quoted by Marshall:

If this suggestion [i.e., the genetic mental inferiority of Africans] makes you angry—if you find the idea of genetic racial advantages outrageous, socially corrosive, and unthinkable—you're not the first to feel that way. Many Christians are going through a similar struggle over evolution. Their faith in human dignity rests on a literal belief in Genesis. To them, evolution isn't just another fact; it's a threat to their whole value system. As William Jennings Bryan put it during the Scopes trial, evolution meant elevating "supposedly superior intellects," "eliminating the weak," "paralyzing the hope of reform," jeopardizing "the doctrine of brotherhood," and undermining "the sympathetic activities of a civilized society."

The same values—equality, hope, and brotherhood—are under scientific threat today. But this time, the threat is racial genetics, and the people struggling with it are liberals.

Evolution forced Christians to bend or break. They could insist on the Bible's literal truth and deny the facts, as Bryan did. Or they could seek a subtler account of creation and human dignity. Today, the dilemma is yours. You can try to reconcile evidence of racial differences with a more sophisticated understanding of equality and opportunity. Or you can fight the evidence and hope it doesn't break your faith.

To repeat, this is not about whether "heritability" (in the technical sense of the ratio of genetically-associated variance to total variance) is ever or always a useful concept, or whether "intelligence" (in the technical sense of a linear combination of correlated test results) is ever or always a useful concept. It's not even about the difference between applying the technical concept of heritability to a metabolic disorder like diabetes and to a statistical construct like IQ. Rather, it's about Saletan's assertion that a starkly racist theory is just as scientifically well established as the evolution of species is.

Josh Marshall calls this "an equation of almost unparalleled absurdity". I agree, not out of political correctness, but out of scientific conviction. And if the point is going to continue to be debated, I want as many people as possible to understand the concepts involved, precisely because I believe that the scientific evidence supports my views, and refutes Saletan's.

Josh offers a suggestion about  Saletan's motivation, or at least about the pattern of his rhetoric:

... right-wing inability to come to terms with modernity and modern science must be equalled by something on the other side of the aisle. [...] I'm not sure I've ever seen a more nonsensical example of that TNR-originating disease of facile contrarianism for its own sake.

This makes sense to me.

Saletan sees is a debate between clear-sighted scientific racists and muddle-headed politically-correct do-gooders. As for P-ter at Gene Expression, I take it that he is just trying to defend the use of simple linear models in science in general, and in empirical population genetics in particular, from what he understood as a general attack. But both of them are wrong about the nature of the argument. ]

Posted by Mark Liberman at November 24, 2007 01:07 PM