October 06, 2007

The Pirahã and us

The Pirahã language and culture seem to lack not only the words but also the concepts for numbers, using instead less precise terms like "small size", "large size" and "collection". And the Pirahã people themselves seem to be suprisingly uninterested in learning about numbers, and even actively resistant to doing so, despite the fact that in their frequent dealings with traders they have a practical need to evaluate and compare numerical expressions. A similar situation seems to obtain among some other groups in Amazonia, and a lack of indigenous words for numbers has been reported elsewhere in the world.

Many people find this hard to believe. These are simple and natural concepts, of great practical importance: how could rational people resist learning to understand and use them? I don't know the answer. But I do know that we can investigate a strictly comparable case, equally puzzling to me, right here in the U.S. of A.

Until about a hundred years ago, our language and culture lacked the words and ideas needed to deal with the evaluation and comparison of sampled properties of groups. Even today, only a minuscule proportion of the U.S. population understands even the simplest form of these concepts and terms. Out of the roughly 300 million Americans, I doubt that as many as 500 thousand grasp these ideas to any practical extent, and 50,000 might be a better estimate. The rest of the population is surprisingly uninterested in learning, and even actively resists the intermittent attempts to teach them, despite the fact that in their frequent dealings with social and biomedical scientists they have a practical need to evaluate and compare the numerical properties of representative samples.

If we project this state of affairs onto the scale of the Pirahã society, with roughly 300 members, we arrive at something like 0.05 to 0.5 people out of 300 who understand how to count and compare quantities in ways that have become essential to the culture. In this respect, I submit, we are exactly like them.

For English-language terms dealing with the comparison of sample statistics, the OED's first citations generally span the period from around 1880 to 1940 (emphasis added):

1885 F. GALTON in Jrnl. Anthropol. Inst. 14 276 The value which 50 per cent. exceeded, and 50 per cent. fell short of, is the Median Value, or the 50th per-centile.

1895 K. PEARSON in Philos. Trans. R. Soc. A. CLXXXVI. 399 The histogram shows, however, the amount of deviation at the extremes of the curve. [Note. The word "histogram" was] introduced by the writer in his lectures on statistics as a term for a common form of graphical representation, i.e., by columns marking as areas the frequency corresponding to the range of their base.

1894 K. PEARSON in Phil. Trans. R. Soc. A. CLXXXV. 80 Then σ will be termed its standard-deviation (error of mean square).

1895 K. PEARSON in Phil. Trans. R. Soc. A. CLXXXVI. 412 A method is given of expressing any frequency distribution by a series of differences of inverse factorials with arbitrary constants.

1918 R. A. FISHER in Trans. R. Soc. Edin. LII. 399 It is..desirable in analysing the causes of variability to deal with the square of the standard deviation as the measure of variability. We shall term this quantity the Variance.

1934 J. NEYMAN in Jrnl. R. Statistical Soc. XCVII. 562 The form of this solution consists in determining certain intervals, which I propose to call the confidence intervals.., in which we may assume are contained the values of the estimated characters of the population, the probability of an error in a statement of this sort being equal to or less than 1 - ε, where ε is any number 0<ε<1, chosen in advance. The number ε I call the confidence coefficient.

Before 1900 or so, only a few mathematical geniuses like Gauss (1777-1855) had any real ability to deal with these issues. But even today, most of the population still relies on crude modes of expression like the attribution of numerical properties to prototypes ("A woman uses about 20,000 words per day while a man uses about 7,000") or the comparison of bare-plural nouns ("men are happier than women").

Sometimes, people are just avoiding more cumbersome modes of expression -- "Xs are P-er than Ys" instead of (say) "The mean P measurement in a sample of Xs was greater than the mean P measurement in a sample of Ys, by an amount that would arise by chance fewer than once in 20 trials, assuming that the two samples were drawn from a single population in which P is normally distributed". But I submit that even most intellectuals don't really know how to think about the evaluation and comparison of distributions -- not even simple univariate gaussian distributions, much less more complex situations. And many people who do sort of understand this, at some level, generally fall back on thinking (as well as talking) about properties of group prototypes rather than properties of distributions of individual characteristics.

If you're one of the people who find distribution-talk mystifying, and don't really see why you should have to learn it, or perhaps think that you're just not the kind of person who learns things like this -- congratulations, you now know exactly how (I imagine) the Pirahã feel about number-talk.

Does this matter? Well, in the newspapers every week, there are dozens of stories about risks and rewards, epidemiology and politics, social trends and psychological differences, with serious public-policy and personal-lifestyle implications, which you can't understand without understanding distribution-talk. And usually you won't just feel baffled -- instead, you'll think you understand, and draw the wrong conclusions.

In fact, the people who write these stories mostly don't understand distribution-talk themselves, and in any case they believe that they need to write for an audience that doesn't understand it. As a result, news stories on these topics are usually impossible to understand correctly unless you go back to the primary sources in order to recover the information that's been distorted or omitted. I imagine that something similar must happen when one Pirahã tells another about the deal that this month's river trader is offering on knives.

If you're one of the small minority who does understand distribution-talk, and you're thinking "well, maybe all those English majors don't get it, but all the technically-savvy people do", please go back and read this quietly-hilarious account of three geek journalists wrestling with the concept of the "long tail". And then imagine three Pirahã joking around about the number seven.

Posted by Mark Liberman at October 6, 2007 06:18 AM