August 31, 2004

Humans context-free, monkeys finite-state? Apparently not.

Forthcoming in Psychonomic Bulletin and Review is a paper by Pierre Perruchet and Arnaud Rey, entitled "Does the mastery of center-embedded linguistic structures distinguish humans from nonhuman primates?" This is a response to a paper by Tecumseh Fitch and Marc Hauser that appeared earlier this year (Computatational Constraints on Syntactic Processing in a Nonhuman Primate, Science, Vol 303, Issue 5656, 377-380 , 16 January 2004).

Both papers address the question in Perruchet and Rey's title: "Does the mastery of center-embedded linguistic structures distinguish humans from nonhuman primates?" Fitch and Hauser presented experimental evidence that the answer should be "yes". Perruchet and Rey do a different experiment of the same kind, and find that the answer is "no".

For an idea about why this is interesting, read Fitch & Hauser's abstract:

The capacity to generate a limitless range of meaningful expressions from a finite set of elements differentiates human language from other animal communication systems. Rule systems capable of generating an infinite set of outputs ("grammars") vary in generative power. The weakest possess only local organizational principles, with regularities limited to neighboring units. We used a familiarization/discrimination paradigm to demonstrate that monkeys can spontaneously master such grammars. However, human language entails more sophisticated grammars, incorporating hierarchical structure. Monkeys tested with the same methods, syllables, and sequence lengths were unable to master a grammar at this higher, "phrase structure grammar" level.

Here in contrast is Perruchet and Rey's abstract:

In a recent Science paper, Fitch and Hauser (2004; hereafter, F&H) claimed to have demonstrated that Cotton-top Tamarins fail to learn an artificial language produced by a Phrase Structure Grammar (PSG, Chomsky, 1957) generating center-embedded sentences, while adult humans easily learn such a language. We report an experiment replicating the results of F&H in humans, but also showing that participants learned the language without exploiting in any way the center-embedded structure. When the procedure was modified to make the processing of this structure mandatory, participants no longer showed evidence of learning. We propose a simple interpretation for the difference in performance observed in F&H's task between humans and Tamarins, and argue that, beyond the specific drawbacks inherent to F&H's study, researching the source of the inability of nonhuman primates to master language within a framework built around the Chomsky's hierarchy of grammars is a conceptual dead-end.

I described the Fitch & Hauser results back in January, and I also criticized their paper for seriously overinterpreting the results of the experiments it reported. I suggested that these might be just "[experiments] about memory span and/or sensitivity to statistical deviations [in local word-sequence counts]. No talk about grammars, much less hierarchies of grammatical complexity, is required".

Other critiques, for example this one by Greg Kochanski at Oxford, emphasized the long-established fact that humans have a great deal of difficulty with center-embedded structures. Fitch & Hauser interpreted their results to mean that human subjects could handle center-embedded structures easily, up to four levels of recursion (three levels are cited in the paper, and an additional level in the background material given on the Science website). As Greg observed, it seems much more likely that the human subjects were using some other method, unrelated to CFG parsing and perhaps not involving any overall grammatical analysis at all. There are plenty of candidates for techniques other than CFG parsing that would work in this particular case, one of which is the approach based on bigram statistics that I suggested.

Perruchet & Rey suggest that Fitch & Hauser's human subjects might have been using exactly such a strategy: "A parsimonious interpretation may be that human participants simply discriminated the cases where there was one female-to-male voice transition (AABB or AAABBB) from the cases in which there were two or three consecutive alternations (ABAB or ABABAB)." But Perruchet & Rey don't just suggest that a strategy other than CFG parsing was used -- they do an experiment to prove it.

You can read their paper for the details, as well as for much interesting discussion. Their experimental materials were almost identical to Fitch & Hauser's -- sets of spoken syllables in either a male or a female voice, arranged in patterns that either alternate -- (AB)n -- or nested -- AnBn -- where "A" means "syllable spoken in a female voice" and "B" means "syllable spoken in a male voice", and n was either 2 or 3. Just as in Fitch & Hauser's experiment, the set of syllables used for the female voice was different from the set of syllables used for the male voice (though the sets used were slightly different, in order to accomodate the fact that the subjects were French): {ba di ro tu la mi no vu} for the female voice, and {sa li mo nu ka bi do gu} for the male voice.

The key difference was that in the nested case, the corresponding A's and B's were constrained to be paired in a fixed way, unlike in Fitch & Hauser's experiment, where no such constraint was imposed. Thus if the lists were matched in the order as given above, examples of grammatical center-embedded patterns would be minodobi -- because mi matches bi and no matches do -- and batuvugunusa -- becuase ba matches sa, tu matches nu and vu matches gu; whereas patterns such as minobido would be ungrammatical, since do is not the proper pairing for mi, and bi is not the grammatically proper pairing for mi.

As you can see, this turns a trivial task ("is the string a sequence of high-pitched syllables followed by a sequence of low-pitched syllables?") into a rather difficult one ("are the high- and low- syllables matched according to the constraints of a context-free grammar?"). Unfortunately for Fitch & Hauser, the first task is so easy (for humans) partly because it can be solved using trivial heuristics that have nothing to do with context-free grammars, such as "is there more than one female-to-male transition in the sequence?" -- or simply "is the sequence one of the two (!) sentences in the language?"

In P&R's experiment, the human subjects were not sensitive to the CFG-generated structure of the test language. They showed the same effect of "acoustic pattern" (i.e. one high-to-low transition versus multiple high-to-low transitions) as F&H's subjects did (83% correct vs. 85% correct), but on "grammaticality" they scored at chance. Furthermore, the "acoustic pattern" effect was stronger for longer strings than for shorter ones, which is the opposite of the effect predicted if the subjects had really been parsing the strings, as opposed to noting the number of high-to-low transitions.

Fitch & Hauser's grammars were far too easy to permit any general conclusions, because a valid acceptor could use a trivial heuristic, totally unrelated to any interesting general properties of the grammar types in question, and certainly without any relationship to the very broad interpretation that F&H give to the results. On the other hand, Perruchet & Rey's grammar may have been unnecessarily difficult. Each of their subjects was required to notice and learn an arbitrary pairing between two sets of 8 CV syllables. But a CFG of the AnBn type only requires that the A's and B's in the string should match in inverted order, as P&R remind us, not that the matches involve some arbitrary mapping of terminal symbols. A much easier grammar to learn would be one in which the pairing (of left and right matching elements) is identity. Then e.g. batuvuvutuba and minonomi would be grammatical, while batuvuvubatu and minomino would not. There are still some problems here with simple non-CFG heuristics, especially if the strings are limited to n=2 and n=3, so human success at this task (if it were to be found!) would still need careful interpretation. I'm not sure what to predict about the outcome of such an experiment.

In any case, P&R put the ball firmly back in the court of anyone who wants to claim a relationship between the levels of the Chomsky hierarchy and the different propensities of humans and monkeys to notice things about sets of strings of spoken syllables.

Although P&R's experiment dealt only with human learning, and empirically challenged only the human half of the Fitch & Hauser paper, they also offer an interesting speculation about what might be going on with the monkeys:

... humans and monkeys were submitted to quite different tests. Students were asked to discriminate the strings consistent and inconsistent with regard to the sound pattern heard previously, and they presumably tuned their response criterion in order to share their responses roughly equally among "same" and "different". By contrast, Tamarins presumably turned towards the loudspeaker only if the sounds emitted by the loudspeaker were biologically significant. This difference deeply undermines a direct comparison between the performances of humans and Tamarins. But why did Tamarins turn towards the loudspeaker when they heard AAABBB after being familiarized with ABABAB, and not the reverse? Although we are limited to speculations, one hypothesis is the following. As any reader can check from listening to the sounds available on the Science web site, the AAABBB strings sound much more like natural human language than the succession of syllables alternately spelled out by the female and male voices that composed the ABABAB strings. This may explain why Tamarins selectively oriented towards the loudspeakers when they heard AAABBB after having been familiarized with the other structure. The reverse did not occur, possibly because the "novelty" introduced by ABABAB presented no potential interest (e.g., the new sounds could not cue the possible presence of humans).

This speculation seems plausible to me, though there's still an interesting question about what constitutes "sounding much more like natural human language" to Tamarins. As P&R suggest, this idea replaces a rather artificial task ("is this jabber similar to the jabber you heard before, or not?") with a more natural one ("is this jabber likely to indicate that humans are around, or not?"). The second task is one that the cotton-top tamarins would have had a lot of previous experience with, since they have been raised in captivity by human keepers, whose presence is likely to have been associated in the past with strong reinforcers, both positive and negative.

[Note: there is a typo worth correcting in the Perruchet & Rey paper, pointed out to me by Geoff Pullum: in the second line on p. 4, they write (ABn) where they mean (AB)n.]


Posted by Mark Liberman at August 31, 2004 10:35 AM