Language Log: Cultural specificity and universal values?

December 22, 2006

Cultural specificity and universal values?

Alain Bentolila, a linguist at the University of Paris, wrote the recent "Rapport de Mission sur L'Enseignmement de la Grammaire", which the French minister of education cited in announcing his program to increase grammar teaching. (See Heidi Harley's post "French Report: It's lucky Copernicus had grammar", 12/18/2006.) When I do a web search for Alain Bentolila, the second item is an interview from L'Express in October of 2002, which covers some interesting ground. Much of it deals with vocabulary rather than with syntax:

Certes, mon oreille souffre lorsqu'on rate un subjonctif, mais l'essentiel est ailleurs: aujourd'hui, un certain nombre de citoyens sont moins capables que les autres d'exprimer leurs pensées avec justesse: 10% des enfants qui entrent au cours préparatoire disposent de moins de 500 mots, au lieu de 1 200 en moyenne pour les autres. Cela a deux conséquences. La première est que leur pouvoir sur le monde s'en trouve limité. La seconde, c'est que cela les enferme dans un ghetto et favorise un communautarisme croissant. Il existe ainsi en France une véritable inégalité linguistique, qui se traduit par une grave inégalité sociale.

Of course, my ear suffers when someone muffs a subjunctive, but the essential problem is elsewhere: today, a certain number of citizens are less capable than others of expressing their thoughts accurately: 10% of children entering elementary school have the use of less than 500 words, instead of 1,200 on average for the others. This has two consequences. The first is that this limits their power over the world. The second is that this shuts them up in a ghetto and encourages a growing sense of ethnic identity. Thus in France there is a genuine linguistic inequality, which translates into a serious social inequality.

[For the decision to translate "communautarisme" as "ethnic identity", see this discussion, e.g. "Communautarisme means that people classify themselves according to some private attributes instead of feeling that they belong to a whole."]

This business of quantifying word usage and vocabulary size, for different groups, has come up a lot recently. I know that I'm not going to get anywhere suggesting that journalists should ask people to cite a source for numbers like these. Unfortunately, though, the numbers are nearly meaningless otherwise.

I don't mean that there aren't large individual, class and cultural differences in vocabulary size, and I don't mean to suggest that these differences don't matter. But the particular numbers that Bentolila cites in this interview are surprisingly low, and I wonder where they came from and how they were measured.

There's an excellent review of relevant literature available on line: Scott Baker, Deborah Simmons and Edward Kameenui, "Vocabulary Acquisition: Synthesis of the Research". Here's part of what they have to say about it:

In their review of vocabulary acquisition, Beck and McKeown (1991) noted that estimating vocabulary size was probably the oldest type of vocabulary research. Thus, during the 20th century, scores of studies have focused exclusively on estimating vocabulary size. Given the complexity of defining word knowledge (Baumann & Kameenui, 1991), it is not surprising that such estimates have varied considerably. For example, Graves (1986) reported that studies of vocabulary size conducted prior to 1960 resulted in estimates ranging from 2,500 to 26,000 words for typical first-grade students, and from about 19,000 to 200,000 words for university graduate students. These discrepancies were due to lack of specificity regarding (a) differences between words and word families (e.g., is a student who knows the meaning of run , ran , and running credited with knowing one, two, or three words?); (b) definitions of word knowledge (e.g., recognizing the meaning of a word in a multiple-choice question versus producing a definition for the word); and (c) the source used to represent English vocabulary (e.g., dictionaries versus word frequency lists) (Beck & McKeown, 1991).

As researchers began to specify more precisely the parameters of vocabulary knowledge, more accurate and consistent estimates of vocabulary size were generated. For example, Nagy and Anderson (1984) attempted to determine the number of printed words used in English materials in grades 3 through 9 by examining the textbooks, workbooks, novels, magazines, and encyclopedias used in the classroom. Their estimate of 88,533 word families is now widely used as the domain of words that students in grades 3 through 9 can be expected to know.

Beck and McKeown (1991) provided another estimate of the number of words students know by examining recent studies that used more defined criteria following the tradition established by Nagy and Anderson (1984). Through more precise measures, for example, estimates of the vocabulary size for 5- to 6-year-olds dropped from a range of between 2,500 to 26,000 words to between 2,500 to 5,000 words.

Beck and McKeown's estimates of vocabulary sizes for kids who are 5 or 6 years old should be roughly comparable to the estimates that Bentolila gives. But 3,750 (the middle of Beck and McKeown's range) is more than three times larger than the 1,200 average given by Bentolila. My guess is that Bentolila is talking about a technique that simply measured the number of different words used in a given time period (or word count) of transcribed speech -- but I don't know. These numbers are invoked to support important social policy choices, and it seems worthwhile to be careful to make it clear where the numbers come from, and what they mean. (After all, we've seen plenty of recent examples where people seem to invent striking numbers to bolster general conclusions about group differences.)

There is little question that large differences exist -- continuing the quote from Baker et al.:

Even as methodological improvements in vocabulary research have occurred, one unequivocal finding has remained: Students with poor vocabularies know alarmingly fewer words than students with rich vocabularies. For example, Beck and McKeown (1991) discussed a study conducted by Smith in 1941, who reported that high-achieving high school seniors knew four times as many words as their low-achieving peers. Smith also reported that high-achieving third graders had vocabularies that were about equal to those of low-achieving twelfth graders.

In 1982, Graves, Brunetti, and Slater (cited in Graves, 1986) reported a study on differences in the reading vocabularies of middle-class and disadvantaged first graders. In a domain of 5,044 words, disadvantaged first graders knew approximately 1,800 words whereas the middle-class students knew approximately 2,700 words. Using a larger domain of words (19,050), Graves and Slater (cited in Graves, 1986) reported that disadvantaged first graders knew about 2,900 words and middle-class first graders approximately 5,800 words.

However the differences are measured, the usual explanation for their cause has to do with differences in childhood experience and perhaps in child-rearing culture. See the discussion of Hart and Risley's work in this post for a sketch of current theories about this. Another relevant piece of research is Martha J. Farah, et al., ("Childhood poverty: Specific associations with neurocognitive development", Brain Research 1110(1) 166-174, September 2006) -- discussed briefly here -- which found a large difference in language-related cognitive measures (an effect size of about 0.95 for vocabulary and sentence-understanding tests) between between middle SES and low SES African-American girls between the ages of 10 and 13.

Group stereotypes sometimes also enter into this, as Bentolila observes:

Q: Il y aurait une forme de fierté, et même d'identité, à se proclamer inculte?
A: Exactement. L'échec devient un signe de reconnaissance du clan. Autre exemple: dans une classe de CP, dans une ZEP de Villeneuve-Saint-Georges [Val-de-Marne], une enseignante de 21 ans tentait désespérément de faire apprendre le mot «succulent». Un enfant s'est levé et a dit: «Ça, c'est un mot pour les filles.» A 6 ans, cet enfant vit déjà dans un monde coupé en deux, celui où le mot rare est un trésor et celui où il est ridicule.

Q: There would be a kind of pride, and even of identity, in declaring oneself uneducated?
A: Exactly. Failure becomes a sign of clan membership. Another example: in a CP class, in a ZEP of Villeneuve-Saint-Georges [Val-de-Marne], a 21-year-old teacher is trying desperately to teach the word "succulent". A child stands up and says "That's a girl's word". At the age of six, this child already lives in a world cut in two, one where a rare word is a treasure and another where it is ridiculous.

Leading up to this passage, Bentolila offered another anecdote:

Dans une étude récente en Seine-Saint-Denis, on a demandé à des collégiens ce que représentait pour eux la lecture. Plusieurs ont fait cette réponse surprenante: «La lecture, c'est pour les pédés!» Cela signifie que, pour eux, la lecture appartient à un monde efféminé, qui les exclut et qu'ils rejettent. Accepter le livre et la lecture serait passer dans le camp des autres, ce serait une trahison.

In a recent study in Seine-Saint-Denis, they asked schoolboys what reading meant to them. Several gave this surprising answer: "Reading is for faggots!" This means that, for them, reading belongs to an effeminate world, which excludes them and which they reject. To accept a book and to read it would be to cross into the others' camp, it would be treason.

This is reminiscent of the language in Leonard Sax's works about the feminization of education in the U.S., and the need to give schoolboys manlier books to read. I'm sympathetic with the complaint and the concern, but I wish the analysis depended less on evocative anecdotes and more on carefully controlled research.

For a start, it would be nice to have a developmental series of speech samples from large, demographically-balanced samples of children through elementary and secondary school. This would help us start to understand what the situation really is, and (if the collection was properly done) to distinguish between general linguistic impoverishment (to the extent that it exists) and imperfect knowledge of the standard language (which is surely widespread).

Bentolila addresses this question indirectly. First he claims that "les gamins de banlieue" simply lack linguistic resources entirely:

Q: Mais en quoi la pauvreté du vocabulaire favorise-t-elle le ghetto et le communautarisme?
A: Il y a une loi simple en linguistique: moins on a de mots à sa disposition, plus on les utilise et plus ils perdent en précision. On a alors tendance à compenser l'imprécision de son vocabulaire par la connivence avec ses interlocuteurs, à ne plus communiquer qu'avec un nombre de gens restreint. La pauvreté linguistique favorise le ghetto; le ghetto conforte la pauvreté linguistique. En ce sens, l'insécurité linguistique engendre une sorte d'autisme social. Quand les gamins de banlieue ne maîtrisent que 800 mots, alors que les autres enfants français en possèdent plus de 2 500, il y a un déséquilibre énorme. Tout est «cool», tout est «grave», tout est «niqué», et plus rien n'a de sens. Ces mots sont des baudruches sémantiques: ils ont gonflé au point de dire tout et son contraire. «C'est grave» peut signifier «c'est merveilleux» comme «c'est épouvantable».

Q: But in what way does a poor vocabulary encourage the ghetto and ethnic identity?
A: There a a simple law in linguistics: the fewer words one has at one's command, the more one uses them and the more they lose precision. You then have a tendency to compensate for imprecision of vocabulary by conniving with your interlocutors, no longer trying to communicate beyond a small circle of people. Linguistic poverty encourages the ghetto; the ghetto reinforces linguistic poverty. In this sense, linguistic insecurity creates a sort of social autism. When the banlieue kids only master 800 words, when other French children have more than 2,500, there is an enormous imbalance. Everything is "cool", everything is "heavy", everything is "fucked", and nothing has meaning anymore. These words are semantic bladders: they have inflated to the point of meaning everything and its opposite. "C'est grave" (= it's serious, it's heavy, etc.) can mean "it's marvellous" as well as "it's dreadful".

The interviewer raises the obvious objection, which is that these areas of "linguistic poverty" have been the source of much linguistic innovation:

Q: On vous dira que, dans les banlieues, on invente aussi des mots nouveaux qui sont, eux, très précis.
A: C'est de la démagogie! Ces néologismes sont spécifiques des banlieues et confortent le ghetto. L'effet est toujours centrifuge. Les enfants des milieux aisés vampirisent le vocabulaire des cités, mais ils disposent aussi du langage général qui leur permet d'affronter le monde. L'inverse n'est pas vrai. Arrêtons de nous ébahir devant ces groupes de rap et d'en faire de nouveaux Baudelaire! La spécificité culturelle ne justifie jamais que l'on renonce en son nom à des valeurs universelles.

Q: Some will say that in the banlieues they also invent new words, which are quite precise.
A: That's demagogy! Those neologisms are specific to the banlieues and reinforce the ghetto. The effect is completely centrifugal. Children from comfortable backgrounds steal the vocabulary of the cities, but they also control the standard language which allows them to engage the world at large. The inverse is not true. Let's stop getting giddy over rap groups and making them into new Baudelaires! Cultural specificity never justifies renouncing universal values.

John McWhorter has engaged a similar set of issues in his books "Losing the Race: Self-Sabotage in Black America" (2001) and "Winning the Race: Beyond the Crisis in Black America" (2005). However, although John is a linguist (in fact, a Language Log contributor), and he has often argued for the value of teaching the standard language, his emphasis has been on content rather than on vocabulary counts and grammatical analysis. For example, he has argued against rap music on the basis of the attitudes and actions it glorifies and encourages, not on the basis of its deviations from standard English.

And I won't put words in John's mouth, but I bet he agrees with me that it's odd to describe the vocabulary of standard French as embodying "universal values" while other vocabularies are "culturally specific". I mean, if you want universal values, you're talking about English, right?

Of course, being broad-minded here at Language Log, we're happy to allow the French to retain their cultural and linguistic specificity, even though their linguistic insecurity does create a sort of social autism, limiting their opportunities for international communication and forcing them to turn inwards and connive, in a lexically-impoverished idiom, with their narrrowing circle of francophone interlocutors.

Posted by Mark Liberman at December 22, 2006 04:51 PM