September 23, 2006

Gabby guys: the effect size

Are women really more talkative than men? A few minutes ago, I did a quick experiment that bears on the question, and the answer turned out to be "no".

The experiment was quick and easy, but it wasn't small, because I didn't need to collect any data: I used a published speech corpus. Specifically, I ran a couple of perl scripts over the transcripts and speaker demographics from the Fisher English Corpus Part 1 (FECP1), a collection of 5,850 telephone conversations lasting up to 10 minutes each, recorded in 2003. Speakers were from all over the U.S., ranged in age from teenagers to people in their 80s, and had educational levels from high-school to post-graduate degrees. Participants were assigned conversational partners at random, and asked to talk for up to ten minutes on one of forty topics like"What do each of you think is the most important thing to look for in a life partner?", or "Do either of you think that you would commit perjury for a close friend or family member?". Calls were routed through a computer in Philadelphia, which recorded them with the knowledge and consent of both parties.

If you don't have much patience for numbers and graphs, here's the summary: in conversations between the sexes, the men used about 6% more words on average than the women did; and in about 55% of such conversations, the male participant talked more than the female participant did. In single-sex conversations, two guys exchanged about 3.2% more words, on average, than two gals did. For more details, read on -- and as a bonus, you'll learn, in exact quantitative terms, whether size really matters. Effect size, that is...

FECP1 includes 1,910 mixed-sex conversations. In 1048 of them (54.9%) the male participant produced more words than the female participant did, while in 862 of them (45.1%) the female participant produced more words than the male participant did. The average number of words produced by a male participant in a mixed-sex conversation was 925.9, while the average number of words produced by a female participant was 866.6, or about 6.4% fewer.

In 2,368 FECP1 conversations where two women were talking, each participant on average produced 901.5 words. This is about 4% more than the average number of words produced by women talking with men. In 1,572 conversations where two males were talking, each participant on average produced 930.4 words. This is about half a percent more than the average number of words produced by men talking with women. So there's a small indication that women might be more talkative when talking with women -- and a smaller indication that men talk more with other men -- but the amount of change is not very large.

This graph of the distribution of word counts for all participants, female and male, shows how similar the distributions for the two sexes were:

And for completeness, the very similar graph of the similar distributions of word counts for male and female participants in mixed-sex conversations:

One way to measure the size of such group differences is to scale the difference between the group averages according to the amount of variation within each group's distribution. More technically, this is the difference between the means divided by the pooled standard deviation. Or in the form of an equation,

This measure of "effect size" is known as Cohen's d. According to the Wikipedia, Cohen (1992) suggested that d of "0.2 is indicative of a small effect, 0.5 a medium and 0.8 a large effect size". For the mixed-sex FECP1 conversations, the effect size of the diference between the number of words used by men and the number of words used by women (expressed in terms of Cohen's d) is 0.203. For all the conversations, the effect size is 0.128.

In other words, these are small to extra-small effects. But they're in the opposite direction from the predictions of Louann Brizendine's (unsubstantiated) claim that women normally produce almost three times more words per day than men, due to crucial biological differences allegedly laid down in the eighth week of fetal life:

A huge testosterone surge beginning in the eighth week will turn this unisex [fetal] brain male by killing off some cells in the communication centers and growing more cells in the sex and aggression centers. If the testosterone surge doesn't happen, the female brain continues to grow unperturbed. The fetal girl's brain cells sprout more connections in the communications centers and areas that process emotion. How does this fetal fork in the road affect us? For one thing, because of her larger communication center, this girl will grow up to be more talkative than her brother. Men use about seven thousand words per day. Women use about twenty thousand. For another, it defines our innate biological destiny, coloring the lens through which each of us views and engages the world. [From The Female Brain, p. 14 -- emphasis added]

However, these small-or-extra-small word-count effects are actually a bit larger than the effects that are generally found for differences in measure of verbal performance between males and females (though most measures show a small performance advantage for females). According to Janet Shibley Hyde and Marcia C. Linn, "Gender Differences in Verbal Ability: A Meta-Analysis", Psychological Bulletin, 104:1 53-69 (1988):  

Many regard gender differences in verbal ability to be one of the well-established findings in psychology. To reassess this belief, we located 165 studies that reported data on gender differences in verbal ability. The weighted mean effect size (d) was +0.11, indicating a slight female superiority in performance. The difference is so small that we argue that gender differences in verbal ability no longer exist. Analyses of effect sizes for different measures of verbal ability showed almost all to be small in magnitude: for vocabulary, d = 0.02; for analogies, d = −0.16 (slight male superiority in performance); for reading comprehension, d = 0.03; for speech production, d = 0.33 (the largest effect size); for essay writing, d = 0.09; for anagrams, d = 0.22; and for tests of general verbal ability, d = 0.20. For the 1985 administration of the Scholastic Aptitude Test-Verbal, d = −0.11, indicating superior male performance. Analysis of tests requiring different cognitive processes involved in verbal ability yielded no evidence of substantial gender differences in any aspect of processing. Similarly, an analysis by age indicated no striking changes in the magnitude of gender differences at different ages, countering Maccoby and Jacklin's (1974) conclusion that gender differences in verbal ability emerge around age 11. For studies published in 1973 or earlier, d = 0.23 and for studies published after 1973, d = 0.10, indicating a slight decline in the magnitude of the gender difference in recent years.

Whatever the size of these effects, are they the direct result of genetic and hormonal effects on brain wiring during embryological (or later) development, as opposed to being a more indirect result of the different life experiences of females and males? (Of course such environmental effects would also be mediated by brain differences, unless we believe that life experiences affect us by modifying our immaterial souls.) The first answer to this question is that no one has a clue. The second answer to this question is that the effects are so small, and so variable according to circumstance, that the question becomes an academic one, in the exact sense of that term -- the answer is of interest to scientists, but it should have no public policy implications at all, except to make us suspicious of people like David Brooks and Leonard Sax.

Here are some comparisons that may help to put these effects and their sizes in perspective. One comparison involves group differences that are mostly genetic, and another involves differences that are mostly environmental. .

1. Some secondary sex differences do involve medium-to-large effect sizes. For example, according to Table 5 of the National Center for Health Statistics' Anthropometric Reference Data for Children and Adults: U.S. Population 1999-2002, the average height of 19-year-old American males is 176.7 cm, with s.d. = 10.6, while the average height of 19-year-old females is 162.9 cm, with s.d. = 11.0. This is an effect size of d=1.32. For the same comparision of 19-year-old males and females, the effect size for the average difference in weight was only d=0.51 (because the standard deviations are much larger relative to the means).

2. Some environmental effects on cognitive performance involve medium-to-large effect sizes. Martha J. Farah, et al., ("Childhood poverty: Specific associations with neurocognitive development", Brain Research 1110(1) 166-174, September 2006) "administered a battery of tasks designed to tax specific neurocognitive systems to healthy low and middle SES [socio-economic status] children screened for medical history and matched for age, gender and ethnicity".

Fig. 1. Effect sizes, measured in standard deviations of separation between low and middle SES group performance, on the composite measures of the seven different neurocognitive systems assessed in this study. Black bars represent effect sizes for statistically significant effects; gray bars represent effect sizes for nonsignificant effects.

All the participants in this study were African-American girls between the ages of 10 and 13. As the graph above indicates, the difference in performance on the "Language" part of the test battery between middle SES and low SES girls represented an effect size of about 0.95.

There were two language-related tasks:

Peabody Picture Vocabulary Test (PPVT)
This is a standardized vocabulary test for children between the ages of 2.5 and 18. On each trial, the child hears a word and must select the corresponding picture from among four choices.
Test of Reception of Grammar (TROG)
In this sentence–picture matching task designed by Bishop (1982), the child hears a sentence and must choose the picture, from a set of four, which depicts the sentence. Its lexical–semantic demands are negligible as the vocabulary is simple and a pre-test ensures that subjects know the meanings of the small set of words that occur in the test.

Note that in terms of effect size, this finding is several times the largest difference found in the Hyde and Linn meta-analysis of sex differences in verbal ability.


[A note about the overall numbers of participants of each sex in the Fisher English Part 1 corpus is in order. In this phase of the data collection, there were 5850 conversations, and therefore 11,700 conversational sides. Of those, 6,646 (or 56.8%) were female, and 5,054 (or 43.2%) were male. This imbalance was caused by the fact that participants needed to be callable at a particular phone number during a particular time period. Thus people who don't work outside the home, or who are retired, are likely to be over-represented in the collection; and women in turn are over-represented in these two groups. In fact, we had to work hard to keep the imbalance of sexes in the collection from being larger.]

[A list of links to other relevant Language Log posts can be found here.]

Posted by Mark Liberman at September 23, 2006 07:26 AM