July 08, 2007

What men and women blog about

Chris Brew writes, under the Subject line "BBC Watch":

This is probably the silliest contribution to the Brizendine discussion yet ("What women talk about", BBC News, 7/6/2007).

The original claim is false, they say. So let's poll readers to find which alternative spurious claims would most appeal to them.

I agree, this is a good example of the tabloidification of BBC science coverage.

If they cared about the answer to their question, instead of about pandering to their readers' prejudices, they might look at the research on the topic. This sort of research has become fairly easy to do, and quite a bit of it has been done.

For example, they might have read M. Koppel, J. Schler, S. Argamon and J. Pennebaker, "Effects of Age and Gender on Blogging" (in AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs), which looked at "all blogs accessible from blogger.com one day in August 2004, downloading each one that included author-provided indication of gender and at least 200 appearances of common English words". This created a corpus of "over 71,000" blogs, from which they created "a subcorpus consisting of an equal number of male and female blogs in each age group, by randomly discarding surplus documents in the larger category". The result was "a total of 37,478 blogs ... comprising 1,405,209 blog entries and 295,526,889 words".

The table below shows male vs. female word frequencies (per 10,000 words) for those "content-based unigrams" (i.e. content words) with the greatest information gain for gender in this sub-corpus, along with standard errors for the frequency estimates.

feature male female
linux 0.53±0.04 0.03±0.01
microsoft 0.63±0.05 0.08±0.01
gaming 0.25±0.02 0.04±0.00
server 0.76±0.05 0.13±0.01
software 0.99±0.05 0.17±0.02
gb 0.27±0.02 0.05±0.01
programming 0.36±0.02 0.08±0.01
google 0.90±0.04 0.19±0.02
data 0.62±0.03 0.14±0.01
graphics 0.27±0.02 0.06±0.01
india 0.62±0.04 0.15±0.01
nations 0.25±0.01 0.06±0.01
democracy 0.23±0.01 0.06±0.01
users 0.45±0.02 0.11±0.01
economic 0.26±0.01 0.07±0.01
shopping 0.66±0.02 1.48±0.03
mom 2.07±0.05 4.69±0.08
cried 0.31±0.01 0.72±0.02
freaked 0.08±0.01 0.21±0.01
pink 0.33±0.02 0.85±0.03
cute 0.83±0.03 2.32±0.04
gosh 0.17±0.01 0.47±0.02
kisses 0.08±0.01 0.28±0.01
yummy 0.10±0.01 0.36±0.01
mommy 0.08±0.01 0.31±0.02
boyfriend 0.41±0.02 1.73±0.04
skirt 0.06±0.01 0.26±0.01
adorable 0.05±0.00 0.23±0.01
husband 0.28±0.01 1.38±0.04
hubby 0.01±0.00 0.30±0.02

Their conclusion:

Male bloggers of all ages write more about politics, technology and money than do their female cohorts. Female bloggers discuss their personal lives -- and use more personal writing style -- much more than males do. Furthermore, for bloggers of each gender, a clear pattern of differences in content and style over age is apparent. Regardless of gender, writing style grows increasingly "male" with age: pronouns and assent/negation become scarcer, while prepositions and determiners become more frequent. Blog words are a clear hallmark of youth, while the use hyperlinks increases with age. Content also evolves with age in ways that could have been anticipated.

This is not the first time we've observed that "young men talk like old women".

The words in that list are certainly consistent with stereotypes, both of women vs. men and of bloggers vs. the general public. So look here, BBC News: you can pander to your readers' prejudices without actually having to make stuff up! Of course, you'd have to talk with some scientists or even read some scientific articles, rather than just rewriting press releases or riffing about your personal issues, but hey, no pain, no gain.

[Note that James Pennebaker, one of the authors of the paper described above, is also one of the authors of the recent Science paper. Another relevant recent paper from his shop is Newman, M.L., Groom, C.J., Handelman, L.D., & Pennebaker, J.W., "Gender differences in language use: An analysis of 14,000 text samples", Discourse Processes (in press). ]

[Update -- Peter Howard writes:

Oh, come on, Mark. It's quite clear from the context that the BBC article you refer to isn't supposed to be science reporting; it's just a bit of fun. It's in the light-hearted *magazine* section of the BBC News website, not in one of their 'serious' columns. Your ongoing criticism of BBC science coverage is usually valuable, but this is just tilting at windmills. (BTW an earlier version of the article contained the phrase 'tongue in cheek', but they seem to have removed that. Perhaps they thought they were stating the bleedin' obvious.)

If you want to have a go at the BBC on this, tackle them on their purportedly serious take on the story ("Men 'no less chatty than women'", 7/5/2007), which contains the sentence:

"The University of Arizona study, in Science, conflicts with previous US research suggesting women talk almost three times as much as men. "

That might give you something to legitimately complain about.

Fair enough. But I'm tired of criticizing them for obvious stuff like that, and I thought that Chris Brew's comment was probably an accurate picture of their editorial thought processes. (For some less temperate, but more entertaining, reactions, see here. And posting on this gave me an opportunity to cite some of the genuine research on the topic, which unfortunately the BBC News is highly unlikely to do.

And speaking of genuine research, Tim Finin reminded me of H. Liu and Rada Mihalcea, "Of Men, Women, and Computers: Data-Driven Gender Modeling for Improved User Interfaces, ICWSW 2007.]

Posted by Mark Liberman at July 8, 2007 12:02 PM