About two years ago, I posted (here, here and here) about Tecumseh Fitch and Marc Hauser's article "Computatational Constraints on Syntactic Processing in a Nonhuman Primate". Last month, Geoff Pullum gave a talk here at Penn, under the title "Monkey Syntax", about some recent work with Jim Rogers on "some not very well known mathematical results which appear to be highly relevant to ongoing experimental work on precursors to syntax in non-human primates". This got me thinking about the questions again. My thinking has been considerably clarified by discussions with Geoff and with Barbara Scholz during their recent visit, and with Jim Rogers over breakfast yesterday. I've jotted down a few notes, which you can read after the jump, if you're interested in such things.

You'll recall that F&H tested the ability of two primate species -- humans and cotton-top tamarins -- to detect novelty in sequences of spoken syllables generated by grammars belonging to different mathematical classes. According to their abstract:

The capacity to generate a limitless range of meaningful expressions from a finite set of elements differentiates human language from other animal communication systems. Rule systems capable of generating an infinite set of outputs ("grammars") vary in generative power. The weakest possess only local organizational principles, with regularities limited to neighboring units. We used a familiarization/discrimination paradigm to demonstrate that monkeys can spontaneously master such grammars. However, human language entails more sophisticated grammars, incorporating hierarchical structure. Monkeys tested with the same methods, syllables, and sequence lengths were unable to master a grammar at this higher, "phrase structure grammar" level.

I suggested at the time that this overinterprets their results. The particular stringsets used in their experiment were finite, and in fact contained only two short strings each, if their terminal vocabulary is divided into the high-pitched female-spoken words and the low-pitched male-spoken words that defined the two grammatically-relevant classes of syllables that they used. Specifically, I suggested this alternative interpretation of their experiments:

Given exposure to instances of the patterns ABAB and ABABAB, tamarin monkeys showed increased interest in patterns AABB and AAABBB, perhaps because these contained two to four copies of the salient (because repeated) two-element sequences (bigrams) AA and BB, which they had not heard before. By contrast, given exposure to instances of the patterns AABB and AAABBB, other tamarins did not show significantly increased interest in the patterns ABAB and ABABAB, perhaps because they contained only one or two copies of the previously-unheard bigram BA, which may also be less salient because it does not involve a repetition.

Given the same stimulus sequences, human subjects were able to categorize the new patterns as different, regardless of the direction of training and testing, perhaps because their threshold for noting statistical sequence differences was lower, and perhaps because they were able to remember longer sequences, thus noting that the training material AABB and AAABBB did not contain the four-element sequence ABAB.

The recent work by Pullum and Rogers aims to "provide an introduction to some interesting proper subclasses of the finite-state class, with particular attention to their possible relevance to the problem of characterizing the capabilities of language-learning mechanisms". These are important issues for anyone interested in understanding familiarization/discrimination experiments, and I'm glad to hear that Geoff and Jim have been talking with Marc Hauser and his students. But some simple familiarization/discrimination results may not be straightforwardly characterized in any grammatical terms at all. And in fact, I think there's good reason to think that the F&H experiments happen to have this property.

Suppose you heard someone reading a list of sequences of six numbers, something like this

73 30 73 30 73 30

97 53 97 53 97 53

42 38 42 38 42 38

. . .

and then another list, something like this

98 98 98 22 22 22

77 77 77 84 84 84

71 71 71 70 70 70

. . .

You'd have no difficulty in detecting that the second one exhibits a different pattern from the first. The same would be true if what you heard were sequences made of random English monosyllables instead of sequences made of random 2-digit integers:

bits field bits field bits field

cots brunt cots brunt cots brunt

wheat spooked wheat spooked wheat spooked

...must must must foist foist foist

hug hug hug peal peal peal

squat squat squat cranes cranes cranes

...

The patterns here are patterns of equivalence across positions in strings of elements drawn from a vocabulary that might as well be infinite, given that none of the elements used ever recur within the experiment. As a result, the standard mechanisms of formal language theory don't give us any direct way to characterize the patterns that we nevertheless so easily recognize.

As a start towards a more general characterization of the kind of patterns under discussion, observe that there are only two possibilities for sequences of length 2, either that both elements are the same or that the second one is different from the first. We can symbolize these options as

AA

AB

(Note that "A" and "B" here denote for any tokens that we like -- the terminal vocabulary is infinite, or at least is limited only by the length of the signals we're willing to sit around to listen to, and the signal-to-noise ratio of the channel on which we're listening.)

For length-3 sequences, there are 5 possibilities, which we can symbolize as:

AAA

AAB

ABA

ABB

ABC

More generally, for a sequence of length n, we're setting up equivalence classes among its positions, resulting in a sequence of what are are called "Bell's numbers", as Eric Weisstein at Mathworld explains:

The number of ways a set of n elements can be partitioned into nonempty subsets is called a Bell number and is denoted B

_{n}. For example, there are five ways the numbers {1,2,3} can be partitioned: {{1},{2},{3}}, {{1,2},{3}}, {{1,3},{2}}, {{1},{2,3}}, and {{1,2,3}}, so B_{3}=5.B

_{0}==1, and the first few Bell numbers for n=1, 2, ... are 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975, ... (Sloane's A000110).

The "Bell" in question is Eric Temple Bell, who among his other accomplishments wrote Men of Mathematics, one of my favorite books when I was a child. It would be nice to have a new version without gender presuppositions, and with some additional sections.

The comments on sequence A00110 at Neil Sloane's magnificent *On-line Encyclopedia of Integer Sequences* include this one, which offers a different perspective on the relationship of Bell's numbers to patterns of equivalence classes in a sequence:

Number of distinct rhyme schemes for a poem of n lines: a rhyme scheme is a string of letters (eg, 'abba') such that the leftmost letter is always 'a' and no letter may be greater than one more than the greatest letter to its left. Thus 'aac' is not valid since 'c' is more than one greater than 'a'. For example, a(3)=5 because there are 5 rhyme schemes. aaa, aab, aba, abb, abc. - Bill Blewett (BillBle(AT)microsoft.com), Mar 23 2004

Short patterns of this type -- strings characterized in terms of position-wise equivalence classes of their elements -- are clearly very salient to humans. (And note that the equivalence-classes can be defined by any salient shared properties, like "starts with [k]" or "is an odd integer".) Given two random schemes from among the 15 possible patterns of length 4, or the 52 possible patterns of length 5, I suspect that after being familiarized to the first pattern, subjects will easily discrimate it from instances of the second, even if none of the local elements used in the experiment ever occurs more than once.

As the patterns' length increases, this task will clearly become harder and harder -- unless the patterns to be discriminated happen to have rather different local properties. For example, if one length-12 pattern happens to start with AAABBB while the other one starts with ABCABC, the discrimination task will be trivial.

One way to model this would be to assume that subjects are sensitive to the statistics of equivalence-class properties of local substrings -- what we might call schematic n-grams -- just as they are sensitive to the statistics of conventional n-grams. This might be as simple as noting when adjacent symbol pairs are the same vs. different, or it might be based on progressively more complicated sorts of calculations, organized in the ways familiar to formal language theorists.

If this is on the right track, then formal language theory will help us understand this sort of auditory texture discrimination after all -- but we'll need to to take a broader view of the vocabulary of the "language", and how it's related to the particular sequences that we use as stimuli.

[For some other ideas about interesting things that might be going on in experiments of this type, take a look at this review of research in visual texture perception. Also relevant, I think, is some of the work on mismatch negativity, which offers an alternative method of measuring the perceived novelty of auditory subsequences.]

Posted by Mark Liberman at February 9, 2006 09:33 AM