I hate this. The world is full of brilliant, promising advances in science and technology, and here I am again, debunking silly overgeneralizations and misinterpretations and even downright charlatanry. Look, my next couple of posts about science or engineering will be full of praise for excellent and insightful research, I promise. But I'm stuck with this one. After taking David Brooks to task for misuse of cognitive neuroscience, I thought I ought to track down the source of the problem. And once I did, I realized that Brooks is not the culprit. That doesn't mean that what he wrote was true. But he took his ideas hook, line and sinker from a recent book by Leonard Sax, M.D., Ph.D.: "Why Gender Matters: What Parents and Teachers Need to Know about the Emerging Science of Sex Differences".
This book has influenced many others besides David Brooks -- it was featured in a Time Magazine cover story in 2005, and Stanley Kurtz praised it in the National Review, and Sax is a leader in the movement for single-sex education -- so I thought I should get a copy and read it. It's an interesting book, and full of ideas worth thinking about. But judging from my experience with the particular factual claim that Brooks took from this book, you'd be wise to keep a very big sack of grains of salt within easy reach when you read it.
Here's what Brooks wrote:
...the part of the brain where men experience negative emotion, the amygdala, is not well connected to the part of the brain where verbal processing happens, whereas the part of the brain where women experience negative emotion, the cerebral cortex, is well connected.
As I explained, this claim about functional localization is false, and can be shown to be so in a few minutes' reading of the scientific literature. So how did Brooks go so far wrong? On p. 29 of "Why Gender Matters", Sax writes:
Girls and boys behave differently because their brains are wired differently.
Deborah Yurgelen-Todd and her associates at Harvard have used sophisticated MRI imaging to examine how emotion is processed in the brains of children from the ages of seven through seventeen. In young children, these researchers found that negative emotional activity in response to unpleasant or disturbing visual images seems to be localized in phylogenetically primitive areas deep in the brain, specifically in the amygdala. (A phylogenetically primitive area of the brain is one that hasn't changed much in the course of evolution: it looks pretty much the same in humans as it does in mice.) That may be one reason why it doesn't make much sense to ask a seven-year-old to tell you why she is feeling sad or distressed. The part of the brain that does the talking, up in the cerebral cortex, has few direct connections to the part of the brain where the emotion is occurring, down in the amygdala.
In adolescence, a larger fraction of the brain activity associated with negative emotion moves up to the cerebral cortex. That's the same division of the brain associated with our higher cognitive functions -- reflection, reasoning, language, and the like. So, the seventeen-year-old is able to explain why she is feeling sad in great detail and without much difficulty (if she wants to).
But that change occurs only in girls. In boys the locus of brain activity associated with negative emotion remains stuck in the the amygdala.42 In boys there is no change associated with maturation. Asking a seventeen-year-old boy to talk about why he's feeling glum may be about as productive as asking a six-year-old boy the same question. [emphasis added]
That superscript 42 is not the answer to life, the universe and everything. It's just a endnote, and it resolves to a reference to a particular scientific paper, namely Killgore, William D. S. CA; Oki, Mika; Yurgelun-Todd, Deborah A. "Sex-specific developmental changes in amygdala responses to affective faces." Neuroreport. 12(2):427-433, February 12, 2001. A footnote or endnote like that, as I'm sure you know, is how authors flag the authority by which they make non-obvious statements. And the claim that adult males are emotional children is certainly non-obvious -- whatever current sexual stereotypes may say.
So I tracked down the Killgore et al. paper and read it. I'm going to share the results with you, because the disproportion between the reported facts and Sax's interpretation is spectacular. (I've also taken the liberty of making a .pdf of the paper available behind the link involved -- if writers like Sax and pundits like Brooks are going to make public-policy recommendations on the basis of a piece of U.S.-government funded research, then the U.S. public should be able to read the research reports.)
Let's start with the "materials and methods". Killgore, Oki and Yurgelun-Todd used functional magnetic resonance imaging (fMRI) to measure changes in blood flow in certain parts of the brain, between periods when subjects were looking at certain pictures and periods when they were looking at a small white circle.
Visual stimuli consisted of six fearful faces selected from the stimulus set of Ekman and Friesen.
The screen was visible via a mirror mounted to the head coil. Each 150 s scanning sequence consisted of five alternating 30 s stimulus/rest periods. ... During baseline and rest periods, subjects were asked to visually fixate on a small white circle located in the center of the screen.
Here's the first of Sax's overinterpretations. The stimuli were "six fearful faces": Sax talks about "unpleasant or disturbing visual images", and "the locus of brain activity associated with negative emotion", and "feeling glum", and so on. But looking at the faces of other people expressing fear, and being yourself depressed, are very different classes of emotions; and there's a larger set of equally diverse negative feelings like anger, disgust, envy, bitterness, grief and so on -- none of which were involved in this experiment.
(There's also a more general issue with experimental design here. There's apparently nothing about the design that guarantees that we're not looking at the effects of seeing faces (or complex visual stimuli) of any sort, since the comparison is between "fearful faces" and "small white circle(s)". As far as I can tell, the experiment was entirely passive -- the subjects were not asked to perform any analysis, or remember anything, or to attend to these rather boring displays in any particular way at all. And Sax himself claims elsewhere in his book that males and females perceive faces and even general visual stimuli in very different ways.)
The subjects in the experiment were
19 healthy children and adolescent volunteers (13 right- and six left-handed by self-report), ranging in age from 9 to 17 years ... The sample included nine males and 10 females ...
Here's the second of Sax's overinterpretations. This is a very small sample, especially because the experimenters want to draw conclusions not only about the effects of sex but also about the effects of age. Worse, the small samples of males and females cover the range of ages rather differently. In the data plots (see below), we can determine that the boys were 11 to 15 -- specifically 1 at 11, 2 at 12, 4 at 13, 1 at 14 and 1 at 15. In other words, six of the nine boys were 12 or 13 years old. ALL the evidence about maturation effects depends on the other three subjects -- and we'll see below that the amount of individual variation is so large that we'd want 10 or 20 subjects at each age before concluding much -- and in any case, this tiny sample of boys only covers the span from 11 to 15 years old.
But Sax concludes that "[a]sking a seventeen-year-old boy to talk about why he's feeling glum may be about as productive as asking a six-year-old boy the same question". Note that the sample of girls was also very small: there were 10 girls, distributed as 1 at 9, 3 at 12, 2 at 15, 2 at 16, and 2 at 17. This spans a larger range of ages (9-17 instead of 11-15) , but the number of subjects at each age is still tiny -- and as we'll see, that makes it impossible to draw any reliable general conclusions about the effects of maturation, because of the (very large) individual differences among subjects of the same sex and age.
What effects of these (limited) stimuli on this (tiny) set of subjects were measured?
Regions of interest (ROIs) for each amygdala were selected with reference to an anatomic atlas. Each ROI was comprised of four pixels, each pixel 3 × 3 mm, sampled from one axial slice... The amygdala ROI's were placed in medial aspects of the amygdala on an axial slice that included the subcallosal area (Brodmann's area 25) and the inferior regions of the middle and superior temporal gyrus. Two ROI's were placed in the dorsolateral prefrontal cortex (Brodmann's areas 46 and 9), localized anterior to the cingulate cortex at the approximate level of the genu of the corpus collosum.
Here's Sax's third overinterpretation. Sax talks about "the locus of brain activity associated with negative emotion" and "the same division of the brain associated with our higher cognitive functions" -- but the experiment didn't look for such loci in general, it looked only in two very small (four-voxel) "regions of interest", namely a particular small piece of the (paired left and right) amygdalas and a particular small piece of the dorso-lateral prefrontal cortex. The brain was imaged in an array of 12x64x128 = 98,304 voxels, of which only 4+4+4+4 = 16 were examined at all. These 16 little bits of brain were selected before the experiment began, on the basis of the researchers' expectations about what parts of the brain were relevant; the rest of the brain was ignored. That's normal procedure in some kinds of fMRI experiments, but it's important not to interpret the results as if activity in the whole brain had been evaluated.
The experimenters then averaged the "signal measured in all pixels in each ROI for each time point during the task activation period", which was the sum of "five alternating 30 s stimulus/rest periods". i.e. the sum of the signal over the periods of time when the subject was looking at (the same three) fearful faces over and over again, divided by the sum of the signal over the periods of time when the subject was looking at a small white circle:
The MR signal was then normalized to each subject's baseline average, derived from the mean of the first seven images, and converted into a metric representing the percent change in MR signal from baseline.
So let's look at the results. Here's the percentage comparison to baseline for the left and right amygdalas in the boys:
Each dot plots the percentage difference, for one boy, in blood flow (in four amygdala voxels) between watching "fearful faces" and watching "small white circles". The left-hand plot is for the left amygdala, and the right-hand plot is for the right amygdala. The horizontal axis is age, and the vertical axis is percent difference. Look at the range of variation for the four boys at age 13, and the small number of subjects and small range of ages, and you won't be surprised that the trends were not found to be statistically significant (though as you can see, the fitted trend line is going down with age for the left amygdala).
And here are the same graphs for the girls:
Again, there's quite a bit of individual variation -- look at the three 12-year-olds. This time, the authors claim that the correlation between age and signal intensity in the left amygdala is statistically significant. Whether it's meaningful is another matter: a lot depends on that one 9-year-old girl, whose point has a lot of leverage. And in interpreting the difference betwen the boys and the girls, the fact that the girls have a much larger age range makes it a lot easier for their data to turn out to have a statistically signficant trend (whether genuinely or by accident).
You can look at the rest of the details for yourself, but I can't resist putting up one more example -- the alleged interaction of sex and age in predicting the difference between dorsolateral prefrontal cortex and amygdala signals:
Again, (c) and (d) are the left and right sides for the boys, while (e) and (f) are the left and right sides for the girls. The authors tell us that
For males considered as a group, there was no significant correlation between age and DLPFC–amygdala difference scores for the (c) left (r = -0.43, ns) or the (d) right (r = 0.40, ns). Females, in contrast, showed a significant correlation between the DLPFC-amygdala difference score on the (e) left (r = 0.73, p = 0.02), but not the (f) right (r = 0.08, ns).
They conclude that
The difference in the observed trajectories between the males and females was significant and suggests that adolescent maturation may involve sexually dimorphic development of prefrontal cortex-amygdala circuits involved in affective processing.
That conclusion could be true -- but would anyone like to place a small wager on how often I can get a random number generator to produce results that look more or less like these, for a simple model of the distribution of signal levels in which there are no sex effects at all? How about a model where the only sex difference is the age of onset of puberty? Or one in which the sex/age effect is on willingness to pay attention in boring experiments? Actually, make that a big wager.
In fact, Killgore, Oki and Yurgelun-Todd pull their punch (after delivering it, and probably because some reviewer made them do it, but still):
Given that our results are preliminary and were obtained with a relatively small sample, conclusions based on these findings must be viewed as tentative until replicated with larger groups of subjects. Future studies would benefit from the inclusion a comparison group of adults so that the trajectory of amygdala response may be examined beyond the adolescent years. Secondly, functional imaging studies have consistently shown that the amygdala rapidly habituates to affective stimuli, resulting in reduced BOLD signal in studies that employ a blocked stimulus presentation paradigm [3,29]. As our study included a blocked presentation, we may have minimized our ability to detect amygdala activation, and future studies may benefit from the use of event related designs. Another potential limitation was that the ROIs used in the present study were limited to four pixels selected from a single coronal slice for each region. It is therefore possible that some regions that are critical for emotional regulation and processing were not adequately sampled.
And that underlines Sax's fourth and largest overinterpretation. He takes a very small study, with very limited stimuli, whose results are messy at best and completely equivocal at worst, but certainly show at least as much individual variation as variation by sex. And he presents this study as if it showed, unequivocally and categorically, that
Girls and boys behave differently because their brains are wired differently.
And more specifically, he tells this clear, coherent, categorical -- and completely bogus -- story:
In young children, ... negative emotional activity in response to unpleasant or disturbing visual images seems to be localized in phylogenetically primitive areas deep in the brain, specifically in the amygdala. ...
In adolescence, a larger fraction of the brain activity associated with negative emotion moves up to the cerebral cortex. That's the same division of the brain associated with our higher cognitive functions -- reflection, reasoning, langauge, and the like. ...
But that change occurs only in girls. In boys the locus of brain activity associated with negative emotion remains stuck in the the amygdala. In boys there is no change associated with maturation.
Now, there are probably group differences by sex and age in emotional processing. And Sax might be right to argue that single-sex education is a good idea. But in presenting this narrative of males as emotional children, Sax is not telling us about the established conclusions of scientific research, despite his display of powerful authority-symbols ("her associates at Harvard", "sophisticated MRI imaging"). He's projecting his own prejudices onto a small and limited experiment with equivocal results, which disagree in part with other experiments (like the one I surveyed in my earlier post).
Leonard Sax should be ashamed of himself for trying to use such spectacularly overinterpreted science to advance his social agenda. Professors like me should be ashamed for not educating more of our public intellectuals to be able to evaluate such advocacy in a sensible and responsible way -- I'm sorry to say that Sax is a graduate of the University of Pennsylvania, where I teach.
And journalists have a special responsibility here, which they almost entirely fail to live up to. It's specifically shameful that Time Magazine couldn't assign a reporter willing and able to check Sax's science. And David Brooks, who disarmingly describes himself as a "scientific imbecile", should be ashamed for not taking on an intern who can read and understand the scientific research that he wants to use to support his conclusions about public policy. But then again, Brooks is a political commentator, and so his goal is presumably to argue for the conclusions that he prefers, not to seek the truth. All the more reason for the rest of us to do the intellectual due diligence that he avoids.
[Update -- several readers have written to suggest that I was too kind to Killgore et al. Barbara Z. pointed out that
One factor which you did not mention from the original research is the fact that about one third of the children tested were left-handed by self-report. There **may be differences in brain function between right-handed and left-handed children and adults which would only be confounded here by the small sample of both sexes.
Indeed, 6 out of 19 subjects were left-handed -- that's more than you'd expect by chance, I think. The authors don't tell us how the left-handers were distributed by sex, but in any case, inclusion of so many left-handers (maybe of any left-handers at all) seems inappropriate in a study with such small N, where conclusions about functional lateralization are being drawn.
And another reader, who wishes to remain anonymous for the moment, wrote that
[L]ooking at their figures, I'll be damned if all their results aren't driven by that one nine year old girl; her leverage has to be very large. I'd be very surprised if standard regression diagnostics didn't throw up a huge warning flag here. But they don't give any details on their statistical procedure, and they don't make their data available.
I'm also curious about the robustness of their statistical analysis; and with merely 19 subjects, it would have been pretty easy for them to present the data as a table of numbers. In my opinion, it's a black eye for the scientific profession that journals like NeuroReport don't routinely require authors to publish the numbers needed to check their analyses. ]
[Note: more discussion and links on this topic are here.]
Posted by Mark Liberman at June 24, 2006 06:19 AM