January 15, 2004

The curious case of quasiregularity

A few days ago, I got a note from Mark Seidenberg, commenting on an earlier piece in which I cited his observation that quasiregularity is ubiquitous in language. I described a controversy among psycholinguists about where linguistic quasiregularity comes from, and Mark took me to task for leaving out some key points. With his permission, I'm reprinting his note below (in blue), interspersed with my own comments.

Hi Mark. I came across the discussion of quasiregularity (quasi-regularity?) in Language Log. Thanks for mentioning it. and positively! I think there's an important point that was misunderstood, however. The quasiregularity notion emphasizes statistical notions of language and in particular in the mappings between codes (e.g. spelling and pronunciation; the semantics of a verb and how it is realized phonologically in a language; the partial correlations between form and meaning that seem to give rise to morpheme-like units, etc).

As you said, we think these regularities emerge in connectionist networks that learn via statistical learning procedures. However, the Pinker theory does not maintain the same notion of quasiregularity but merely suggest that it has a different basis. To the contrary, he holds to a strict dichotomy between rule-governed forms and exceptions (in keeping with the traditional approach to the lexicon in which there are rules for generating various forms and the lexicon encodes the unpredictable information). Thus, the two types of forms are said to exhibit different structural properties, are learned by different mechanisms (rule-induction, rote memorization), produce different behavioral effects (e.g., exceptions are affected by frequency, rule-governed forms are not), and are represented in different parts of the brain (leading to "selective" impairments in generating one or the other type of form).

I agree, and I think Pinker, Ullman and others would as well. But I was arguing against the odd (though widespread) notion that linguistic irregularity is immoral, and I wanted to stress that everyone's psycholinguistic theory has mechanisms for creating morphophonological patterns but also mechanisms that tend to prevent these patterns from being perfectly regular.

For Seidenberg, the two mechanisms are the same, and he makes an argument (which I find persuasive) that quasiregularity is both ubiquitous as a descriptive fact and also inevitable as a theoretical consequence of a connectionist approach. From the perspective of people like Pinker and Ullman (and many others before them), irregularity arises because linguistic knowledge comes in two flavors, analogous to the distinction between semantic and procedural memory. I also find this idea somewhat persuasive: at least the mechanisms (and brain systems) involved in knowing a word and in putting words together in a sentence seem to be different.

The two-mechanism theory can account for similarities among the irregular forms (sing-sang ring-rang etc), via the "associative net" that has been part of the story since Pinker and Prince 1988. What it can't do is capture the similarities between rule-governed forms and exceptions, which are pervasive. As you know most of the irregular past tenses share structure with rule-governed forms; sent, hit, sat, bid, hid, and so on all end in a phoneme (for lack of a better word) that is one of the realizations of the regular past tense; exceptions like SLEPT pattern with regulars like STEPPED; the problem isn't with application of the rule but with deformation of the stem, and so on. I know you know this (as did Halle and Mohanan). Anyway, it is easy for our approach to capture these partial regularities, which arise from a variety of sources and can be encoded to whatever degree they are present by a multilayer net trained using a statistical learning procedure such as backprop. Pinker has to treat the overlap between rule governed forms and exceptions as coincidental at best, or the detritus of historical events like diachronic change (which of course I think can also be handled by the same mechanisms we use for acquisition and processing).

Mark is making a very important point here, highlighting something that has always puzzled me about the attitude of linguists towards the Pinker-Seidenberg controversy. To explain why this attitude is so weird, I have to beg your indulgence for a little intellectual history.

As of 1950 or so, a common (though not at all universal) view among American structuralist linguists was that the phonemic level of representation had to be connected to surface (phonetic) forms by relatively simple and transparent principles, without lexical or morphological exceptions. Other principles, qualitatively different in character and usually seen as nothing more than a sort of fossilized residue of older sound changes, related morphemes to phonemes. In the middle 1950s, a (then) young linguist named Morris Halle challenged this view, by showing (or let's say, by arguing persuasively) that these principles of analysis, applied to Russian and other Slavic languages, missed important generalizations about palatalization, which had to be seen as applying both "before" and "after" the structuralist phonemic level. The key argument was that "phonology" and "morphophonology" share sound patterns, and so should be merged into one system.

This was the opening salvo, on the phonological front, in the battle of "generative" linguists against their structuralist parents. The structuralists were quickly defeated, or perhaps I should say were driven from the plains and took refuge in a few mountain fortresses on Ivy League campuses and in Anthropology departments. The underlying issues -- how the sound patterns of morphemes, words and phrases are represented and inter-related -- of course remained to be investigated, and there have been many intricate and interesting campaigns ranging across this intellectual countryside over the past half century.

However, on the particular point in question, Halle has completely carried the day. I don't know any working phonologists today who think that morphologically-conditioned (or otherwise "irregular") sound patterns are ipso facto distinct in every way from perfectly transparent ones. There may be some, but they have roughly the status within linguistics of biologists who think that AIDS is not caused by HIV, or cosmologists who think that the big bang never happened. That doesn't mean they're wrong, just that they're intellectually isolated.

And then there's Steve Pinker. He might not be a card-carrying phonologist, since his degree is in psychology, and he doesn't design (or even use) the sorts of systematic, formal descriptions that are phonologists' stock in trade. But he wrote a whole book on one aspect of English morphophonemics -- Words and Rules -- and a carefully wrought, enormously entertaining book it is. A few years ago, Pinker gave a seminar on this book's material to a bunch of literary scholars here at Penn, and by the end of the hour, they were all vying with one another to supply examples and counterexamples. I could hardly believe it, it was sheer genius.

Anyhow, Pinker's theory is structuralist phonemics reborn. 'Rules' must be perfectly regular (though he has not been very specific about exactly what these rules are, how they can interact, what they can do and what they can't do); any forms whose derivation is not perfectly regular are 'words', which are stored and retrieved by a memory system completely disjoint from the rule system. If there appear to be processes in common between the rule system and the word system, it's just because the rule system long ago created some patterns that have been left as fossilized imprints on stored words. The storage system for words is seen as a connectionist one, but its emergent regularities are fundamentally different from the patterns created by the system of rules.

Mark Seidenberg, on the other hand, upholds Morris Halle's view that "morphophonemic" sound patterns can be (and typically are) made of exactly the same stuff as sound patterns that happen to be phonologically transparent. And what reward does he get for this? Phonologists root for Pinker, and psychologists choose sides in part based on what they think about modern phonology, with Seidenberg typically seen as the champion of those who would like to overthrow the whole generative-linguistics enterprise. There's a lot more to be said about this whole area, but believe me, it gets even stranger.

Mark's note to me continued:

You might be interested in the recent paper by Haskell MacDonald and myself which address the rat-eater *rats-eater cases, where the contrast between a graded, quasiregularity and the Pinker alternative is illustrated clearly (on the http://lcnl.wisc.edu website).

The paper Mark is referring to is here. His note continues:

So, it matters whether you think the system is quasiregular (graded, statistical, exhibits partial regularities) or not (rule governed vs. exception). Thus the disagreement is about the basic nature of the system, not different ways of explaining quasiregularity.

I think the same issues arises repeatedly in the characterization of linguistic knowledge, e.g., morphemes and phonemes that are not discrete beads on a string, sentences that are ungrammatical but overlap with grammatical ones, and so on.

I agree. On the other hand, the connectionist work has not yet produced a very convincing account of syntax (in the sense of general recursive compositionality). I'm impressed by some of Michael Ullman's recent work, elaborating and reinforcing older ideas about the linguistic role of frontal motor-skill circuits vs. posterior semantic memory systems. So I hold open the possibility that the McClelland/Seidenberg and Pinker/Ullman views might both turn out to be partly right about language in general, whoever turns out to win on the particular question of the English past tense.

I think that linguists in general, and phonologists in particular, should pay more attention to the details of this debate. They should press Pinker and his allies for more precise definitions of what "regular" means, and they should think hard about whether they're really willing to re-fight the battle of the two Slavic palatalizations, but on the other side this time.

Mark and I had a further exchange about syntax, which I'll post later.

[Update: In this 1991 Linguist List posting, Bruce Nevin describes Halle's argument against the (structuralist) phoneme in some detail. I haven't checked any sources, but his summary rings true to me.

Aside from the details, there are two inter-related questions that are not always clearly separated -- certainly I didn't separate them in the discussion above. One question is how simple and transparent the phonemic/phonetic relationship is, and and the other is whether morphological exceptionality is allowed as an integral part of that relationship. The questions are connected because morphological conditioning of sound patterns would automatically make them "irregular" by most definitions; and on the other hand one can nearly always encode (apparent or real) lexical exceptions by making underlying forms more abstract.

Bruce also makes explicit what I hinted at, namely that some structuralists (such as Bloomfield) were on the Seidenberg/Halle side rather than the Bloch/Pinker side.

In the end, I think that my point stands. Phonologists from Sapir, Bloomfield and Harris to Halle, McCarthy and Prince have believed that some quasiregular sound patterns are part of the same phonological system as completely regular sound patterns. In this, they are on Seidenberg's side against Pinker, whose team includes Trager/Smith, Bloch and some other structuralists, along with (some variants of) "natural phonology".

I don't have any deep convictions on this question. But if Pinker is right, most phonologists working today need to repudiate most of their own work. Maybe he's right, and they should -- but I don't see how they can applaud politely for his side of the debate, and then go on doing their own stuff as if he were wrong.]

Posted by Mark Liberman at January 15, 2004 09:41 AM