March 02, 2008

Scrupulously avoiding sigma

The cover story in today's NYT Magazine is Elizabeth Weil, "Teaching Boys and Girls Separately". Unsurprisingly, it features the ideas of Leonard Sax, a tireless advocate for single-sex education. Weil's story starts like this:

On an unseasonably cold day last November in Foley, Ala., Colby Royster and Michael Peterson, two students in William Bender's fourth-grade public-school class, informed me that the class corn snake could eat a rat faster than the class boa constrictor. Bender teaches 26 fourth graders, all boys. Down the hall and around the corner, Michelle Gay teaches 26 fourth-grade girls. The boys like being on their own, they say, because girls don't appreciate their jokes and think boys are too messy, and are also scared of snakes. The walls of the boys' classroom are painted blue, the light bulbs emit a cool white light and the thermostat is set to 69 degrees. In the girls' room, by contrast, the walls are yellow, the light bulbs emit a warm yellow light and the temperature is kept six degrees warmer, as per the instructions of Leonard Sax, a family physician turned author and advocate who this May will quit his medical practice to devote himself full time to promoting single-sex public education.

Because of efforts by Dr. Sax and others, the traditional boy/girl issues like messiness, snakes, and different taste in jokes are now bolstered by starkly-posed findings from psychophysics and neuroscience. These scientific arguments for sex-segregated education often seem to be careless and misleading at best -- for a sample, see "Are men emotional children?" (6/24/2006), "Leonard Sax on hearing" (8/22/2006), "Girls and boys and classroom noise" (9/9/2006).

I'm not going to repeat the debunking exercise this morning (though Dr. Sax has sent me several newer studies about sex differences in hearing and in vision, which are equally unable to support the conclusions that he wants to draw from them). Instead, I'd like to applaud Elizabeth Weil for including in her article a helpful quote from Jay Giedd about the comparison of sampled distributions:

Scans of boys' and girls' brains over time also show they develop differently. Analyzing data from the largest pediatric neuro-imaging study to date — 829 scans from 387 subjects ages 3 to 27 — researchers from the National Institute of Mental Health found that total cerebral volume peaks at 10.5 years in girls, four years earlier than in boys. Cortical and subcortical gray-matter trajectories peak one to two years earlier in girls as well. This may sound very significant, but researchers claim it means nothing for educators, or at least nothing yet. "Differences in brain size between males and females should not be interpreted as implying any sort of functional advantage or disadvantage," the N.I.M.H. paper concludes. Not one to be deterred, Sax invited Jay Giedd, chief of brain imaging at the Child Psychiatry Branch at N.I.M.H., to give the keynote address at his N.A.S.S.P.E. conference in 2007. Giedd spoke for 90 minutes, but made no comments on schooling at all.

One reason for this, Giedd says, is that when it comes to education, gender is a pretty crude tool for sorting minds. Giedd puts the research on brain differences in perspective by using the analogy of height. "On both the brain imaging and the psychological testing, the biggest differences we see between boys and girls are about one standard deviation. Height differences between boys and girls are two standard deviations." Giedd suggests a thought experiment: Imagine trying to assign a population of students to the boys' and girls' locker rooms based solely on height. As boys tend to be taller than girls, one would assign the tallest 50 percent of the students to the boys' locker room and the shortest 50 percent of the students to the girls' locker room. What would happen? While you'd end up with a better-than-random sort, the results would be abysmal, with unacceptably large percentages of students in the wrong place. Giedd suggests the same is true when educators use gender alone to assign educational experiences for kids. Yes, you'll get more students who favor cooperative learning in the girls' room, and more students who enjoy competitive learning in the boys', but you won't do very well. Says Giedd, "There are just too many exceptions to the rule." [emphasis added]

There's only one problem with Giedd's excellent example: the cited height difference is not really true for kids in most of the age range under discussion, as any parent of a child in the first through tenth grades will recognize. But looking a little more closely may help to underline his point. The NIST Anthrokids database gives height (in centimeters) from a sample of around 100-140 females and males in each of 16 age ranges, measured in 1977:

Age
Male mean Male SD Female mean Female SD Difference in cm
(male - female)
Effect size
(difference divided by pooled SD)
2.0-3.5
94.5
5.0
92.1
4.7
2.4
0.49
3.4-4.5
101.3
4.5
101.6
4.6
-0.3
-0.07
4.5-5.5
108.6
4.7
108.0
4.6
0.6
0.12
5.5-6.5
115.1
5.2
114.2
5.1
0.9
0.17
6.5-7.5
122.0
5.1
120.5
5.7
1.5
0.28
7.5-8.5
127.8
5.6
125.9
5.5
1.9
0.34
8.5-9.5
133.4
6.1
132.7
5.9
0.7
0.12
9.5-10.5
137.9
6.3
137.5
6.2
0.4
0.06
10.5-11.5
142.5
5.3
144.2
7.6
-1.7
-0.26
11.5-12.5
148.4
7.4
149.3
7.0
-0.9
-0.12
12.5-13.5
154.1
8.4
155.1
7.0
-1.0
-0.13
13.5-14.5
161.3
8.7
158.4
7.0
2.9
0.37
14.5-15.5
166.4
8.7
162.0
6.4
4.4
0.58
15.5-16.5
174.5
7.8
162.1
6.1
12.4
1.77
16.5-17.5
175.9
6.0
162.5
5.9
13.4
2.25
17.5-19.0
177.1
6.8
163.0
5.9
14.1
2.21

So among the 17- to 19-year-olds, the males really are two standard deviations taller. But up through age 16.5, at least for kids like those in the NIST sample, assigning locker rooms on the basis of height would be even closer to random than Dr. Giedd suggested.

The term "standard deviation" comes up more often in the New York Times than you might think. But it's rare -- maybe unprecedented -- for one article in the New York Times Magazine to use it three times in a meaningful way, as Weil's piece does. Two of the instances are in the quotation from Dr. Giedd given above. The third one is in a quote from me:

Sax initially built his argument that girls hear better than boys on two papers published in 1959 and 1963 by a psychologist named John Corso. Mark Liberman, a linguistics professor at the University of Pennsylvania, has spent a fair amount of energy examining the original research behind Saxís claims. In Corso's 1959 study, for example, Corso didnít look at children; he looked at adults. And he found only between one-quarter and one-half of a standard deviation in male and female hearing thresholds. What this means, Liberman says, is that if you choose a man and a woman at random, the chances are about 6 in 10 that the woman's hearing will be more sensitive and about 4 in 10 that the man's hearing will be more sensitive.

A surprisingly large fraction of the misapplications of science to public policy arise because most people in our society never learn simple techniques for thinking about differences in sampled distributions, and therefore have to fall back on pop-platonic ideas about properties of group archetypes.  See here and here for some discussion of a different case:

The rhetoric of science journalism -- and sometimes the rhetoric of science -- all too easily engages a sort of pop-Platonism that seems to be deeply connected to the way that we think about natural kinds. As a result, small (but statistically reliable) differences in group distributions are seen as essential properties of the groups themselves, and therefore of all the individuals that make them up. Or at least, all the normal or typical individuals. Intellectual and social mischief often ensues.

If we look back at the uses of the term "standard deviation" in the NYT index, we find that most of the examples are presented as a species of arcane financial magic ("Historical volatility bears him out: he says that since 1993 the annualized standard deviation of returns, a customary measure of volatility, has been 22 to 23 percent in emerging markets, versus 13 to 14 percent for the Standard & Poor's 500-stock index.") or oddly-explained scientific jargon ("To physicists, the gold standard for a discovery is what they call a "5-sigma" bump, where sigma is a measure of bumpiness known as a standard deviation.").

In neither case is any understanding of the properties of distributions invoked. And in one telling case, the term "standard deviation" is used as an symbol of incomprehensibility (Anastasia Rubis, "Just your standard deviation", 3/20/2005):

As I skipped around in The Princeton Review, scrupulously avoiding permutations and standard deviation, certain synapses began firing for the first time in a quarter-century, even if the connections were as stiff as morning joints.

This is "a not-so-happy-housewife more than 20 years out of college, [who] had resolved to apply to graduate school for a teaching degree".

It's not a new idea to base legal, educational, or social prescriptions on scientific findings. It's not a bad idea, either, unless such arguments are based on bad science, or on good science badly applied. But I'm afraid that in today's educational policy debates -- and not just about segregation of the sexes -- the density of bad or misrepresented science is high and rising. In self-defense, our society needs to persuade people like Anastasia Rubis that standard deviations should not be so scrupulously avoided.

Posted by Mark Liberman at March 2, 2008 07:10 AM