She and he in the Wall Street Journal

wsj As part of a project on non-standard pronoun case in coordination (Me and him did it; between you and I), Stanford student Tommy Grano did some searching through various sources of data, among them the Wall Street Journal corpus (formal writing, containing very few non-standard pronouns) and AltaVista (more informal writing, with more non-standard pronouns, but still not many), and stumbled on a much larger and more striking difference between the two sources as representatives of different genres: a big sex bias in the WSJ.

There were five relevant variables in this little study, which looked at conjunctions of personal pronouns with nonpronominal NPs: person/number of the pronoun; case of the pronoun (nominative vs. accusative); order of pronoun and the NP; grammatical function of the coordination (subject vs. direct object vs. object of a preposition); and source (WSJ vs. AV).   Grano found small numbers for some person/number combinations (1/pl and 3/pl), for some orders of pronoun and NP, and for non-standard case choices.  But substantial numbers of examples appeared for the standard subject case choices NP and I, he and NP, and she and NP. (Other studies suggest that I prefers second position, while the other pronouns tend to prefer first position, which puts light elements before heavier ones.)

There were 85 conjunctions of pronoun and NP in WSJ, 574 in AV.  The results, expressed as percentages of these totals for particular combinations, source by source:

NP and I
he and NP
she and NP

Conjunctions involving 1/sg are pretty much the same in the two sources, but those involving 3/sg are wildly different: in AV, male and female are more or less comparable, though with an advantage to female; but in WSJ, it's male over female by an enormous margin.  The Wall Street Journal seems to talk about men in connection with others (other people, or ideas, or whatever) vastly more than women.  In everyday life, as sampled (however imperfectly) by AltaVista, women are slightly in the majority, but in the world of public events, women are of little note.  Well, we knew that, but, still, I was a bit shaken by the size of the difference.

[Buried in that table is the fact that in WSJ, a full 70% of the conjunctions are 3/sg, while in AV it's only 44%.  Not surprising, since AV has a lot of second-person reference (which was not of much interest to Grano, since you shows no case differentiation).]

