February 25, 2006

Learnable and unlearnable patterns -- of what?

A couple of weeks ago, I suggested that some much-discussed recent research on "grammar learning" by monkeys might involve sensitivity to relations of sameness and difference in strings, which can't be represented as grammatical constraints (of whatever complexity) on sequences of specific items. Mark Seidenberg pointed out to me that this same issue came up in a 1999 discussion in Science following a paper by Gary Marcus et al. Links to the discussion are available on Mark's website, on a page of readings for a presentation on Language and the Mind -- the relevant items are #5, #6, #7.

The first paper is Marcus, G.F., Vijayan, S., Bandi Rao, S., & Vishton, P.M. (1999). "Rule learning by seven-month-old infants", Science 283, 77-80. The abstract:

A fundamental task of language acquisition is to extract abstract algebraic rules. Three experiments show that 7-month-old infants attend longer to sentences with unfamiliar structures than to sentences with familiar structures. The design of the artificial language task used in these experiments ensured that this discrimination could not be performed by counting, by a system that is sensitive only to transitional probabilities, or by a popular class of simple neural network models. Instead, these results suggest that infants can represent, extract, and generalize abstract algebraic rules.

The approach was to familiarize infants with syllable sequences having either the pattern ABA or ABB, where ABA might correspond to "ga ti ga" or "li na li". ABB might correspond to "ga ti ti" or "li na na". Then the infants were tested for their relative interest in sets of new ABA or ABB patterns, by measuring how long they looked at a flashing light associated with the source of the sound. In another experiment, AAB patterns were compared with ABB patterns. The point was that "a system that was sensitive only to transitional probabilities between words could not account for any of these results, because all the words in the test sentences are novel and, hence, their transitional probabilities (with respect to the familiarization corpus) are all zero".

This led to a set of responses: "Do Infants Learn Grammar with Algebra or Statistics?" / Letters from: Seidenberg & Elman, Negishi, Eimas, Marcus, (1999) Science 284, 433.

The letter from Seidenberg and Elman argued that

... the conclusion by Marcus et al. that the infants had learned rules rather than merely statistical regularities is unwarranted. ... these "grammatical rules" created other statistical regularities. AAB, for example, indicated that a syllable would be followed by another instance of the same syllable and then a different syllable. Thus, in the pretraining phase, the infant was exposed to a statistical regularity governing sequences of perceptually similar and different events.

The letter from Peter Eimas observed that

... there is evidence that 7-month-old infants can discriminate objects by means of the abstract relations, same or different.

citing D. J. Tyrrell, L. B. Stauffer, L. B. Snowman, Infant Behav. Dev. 14, 125 (1991).

In a later issue, we get Altmann, G.T.M. (1999), "Rule learning by seven-month-old infants and neural networks", Science 284, 875a. This paper shows that a previously developed PDP network for learning sequential patterns -- published in 1995 -- behaved similarly to the 7-month-olds in the cited paradigm:

Like the infants studied by Marcus et al., our networks successfully discriminated between the test stimuli. The conclusions by Marcus et al. stated in the report are premature; a popular class of neural network can model aspects of their own data, as well as substantially more complex data than those in the report. The cognitive processes of 7-month-old infants may not be so different from statistical learning mechanisms after all.

It seems to me that it's not very helpful to try to draw a distinction between learning "abstract rules" and learning "statistics", without being very precise about what kind of "rules" and what kind of "statistics" are at issue. Marcus et al. tried to create patterns that "statistics" couldn't learn -- and indeed it's true that counting syllable n-grams would not distinguish their patterns, as long as disjoint sets of syllables are used in familiarization and test. But for exactly the same reasons, no formal grammar constraining sequences of a specific terminal vocabulary could solve their problem either. In contrast, a learner who pays attention to patterns of same and different items can learn their distinctions by trivial methods -- either "statistical" or "grammatical".

In the case of the Fitch & Hauser paper that we discussed two years ago, the patterns ABAB and ABABAB, vs. AABB and AAABBB were the same in both familiarization and testing phases. (More exactly, A and B represent classes of syllables, with A being one of {ba di yo tu la mi no wu} spoken by a single female speaker, while B is one of {pa li mo nu ka bi do gu} spoken by a single male speaker; strings of syllables are formed by random selection from the respective sets without replacement.) As a result, the monkeys (and the undergraduates) might have been learning either sequences of item-classes (probably just "female voice" vs. "male voice") or sequences of same vs. different item-classes or (most likely) both. And they might have learned (or failed to learn) the patterns either by attending to "statistics" or to "rules" -- though every algorithm that I can think of, in this case, could be described using either word, depending on taste.

Posted by Mark Liberman at February 25, 2006 04:27 PM