April 26, 2004

Formal children

Fernando Pereira at Fresh Tracks describes a discussion with his wife Ana about the relative frequency of the Portuguese words for "child" (criança/as vs. menino/a/os/as) in Portugal and in Brazil. The discussion was provoked by a passage in John McWhorter's book The Power of Babel, which Ana had been reading, and they explored the answer by reference to ratios of ghits.

It used to be that an unabridged dictionary and an encyclopedia would be kept accessible in middle-class homes, for settling questions of language or fact. Now the dictionary is likely to be an online one, and "the internet" is likely to be used for fact finding in place of the encyclopedia. I'm also seeing more and more cases of people using Google and similar search facilities to address usage questions by counting things. Of course, Fernando and Ana are hardly an ordinary couple in this respect.

If I understand Fernando's post, John turns out to be correct, more or less. The most interesting part is the apparent interaction among singular vs. plural and Portugal vs. Brasil (numbers are rations of criança to menino):


Fernando offers an explanation in terms of the interaction between formality (greater in Portugal) and the specification of gender (which is required for forms of menino but not criança, and thus favors menino in informal uses, since one is likely to be talking about particular kids whose identities and genders are known). The idea seems plausible; it could be tested by examining a random sample of uses of each form. In this connection, t would be nice if Google (or a similar search engine) could be persuaded to return a random sample of the hits for a particular query, rather than the usual relevance-ordered list. Doing one's own pseudorandom sampling is not possible, since Google will not serve up results for starting points beyond 1,000 -- and the top 1,000 ghits are almost always a biased sample.

Anyhow, there is a partly analogous case in English with child/children and kid/kids: the former seems more formal and also more British, while the latter seems more informal and also more American. Of course, there is no gender marking involved in either word. In both the .com domain (probably mostly American) and the .co.uk domain (certainly mostly British), forms of child are commoner than forms of kid. However, the .com domain definitely has relatively more kid/kids (confirming that it is an Americanism). However, the effect of singular vs. plural is opposite in the two domains (numbers are ratio of child to kid (singular) or children to kids (plural). The notion that this is an effect of formality more than geography is supported by the fact that the .edu domain, which is almost all American, is even more strongly dominated by child/children than the .co.uk domain is:


[Update: David Nash points out that because of Google's propensity to ignore apostrophes, the estimates of children/kids ratios are too low, since Google will lump kid's in with kids. (So will all too many writers, alas.) David suggests that counts for childs/child's might help balance things, but one would have to figure out how many of those are the name rather than the possessive form of child.]

Posted by Mark Liberman at April 26, 2004 09:43 PM