June 08, 2004

Liberal gemination

Gene Buckley's example of gemination-swap in dissapointed (in response to these other posts on the topic) hits on what may be a general "principle" of English spelling: given a choice, doubled consonants prefer to come early in the word. This effect is seen most clearly in words in cases where conservation of geminates is violated: enemy is more often written ennemy than enemmy, accommodate shows up more often as accomodate than acommodate, and so on.

The following charts illustrate this principle, with raw Google hits. We see that when spellings fail to hit the correct target (in dark grey), they tend to slip down and to the left (first consonant doubled, second singleton).

enemy
C2
m
mm
C1 n
8,950,000
349
nn
14,500
1
eradicate
C2
d
dd
C1 r
706,000
13
rr
6,910
1
accommodate
C2
m
mm
C1 c
3,150
5,160
cc
587,000

5,740,000

assassinate
C2
s
ss
C1 s
7,090
617
ss
13,600

243,000


commission
C2
s
ss
C1 m
370,000
173,000
mm
32,500,000*

50,500,000

*The hits for commision are wildly inflated here, because many sites are linked to with misspelled links, but don't actually contain the misspelling themselves. I imagine this is a problem for all of these counts, but it is especially dramatic here. (Thanks to Paula Aden for pointing out the reason behind this strange behavior to me.)

If the word has just the second consonant doubled, then "conservation of geminates" should conspire with "geminate early" to make the doubling shift to the first consonant. Indeed, this does often occur—though global degemination is also very common.

resurrection
C2
r
rr
C1 s
65,800
2,680,000
ss
98,700

7,900

recommend
C2
m
mm
C1 c
600,000
30,900,000
cc
300,000

115,000

For words that already obey the "geminate early" rule, switching the doubling to the second consonant is certainly not unusual (this is the Karttunen > Kartunnen error). For many words, however, the most common misspelling seems to be to violate conservation of geminates, and write the word with no doubled consonants at all (upper left corner)

attitude
C2
t
tt
C1 t
409,000
2,820
tt
9,670,000
2,160
imminent
C2
n
nn
C1 m
9,870
29
mm
1,690,000
87

Another one of my favorites is mayonnaise, which is sometimes written mayonaisse (but more often mayonaise):

mayonnaise
C3
s
ss
C2 n
94,000
7,330
nn
663,000
886

Amusingly, gemination can even spread to the y: mayyonaise (42 hits) or mayyonnaise (20 hits); but no misspelling of this word is nearly as common as simply degeminating across the board (mayonaise), in violation of conservation of geminates.

The preference for degemination is also true of many of the other examples that have been discussed in previous posts. Disappointed may show up 288,000 times on Google, but disapointed gets a whopping 1,360,000 hits. The same point is made by the Jennifer/Jenifer/Jeniffer data, in which Jenifer outnumbers Jeniffer by a ratio of 4:1. In fact, it's not clear that conservation of geminates is much of an effect at all, given that the same effect could be achieved the independent forces of (1) global degemination (recommend > recomend), and (2) spontaneous first consonant gemination (enemy > ennemy, recomend > reccomend). There's something intuitively appealing about the idea (encouraged, perhaps, by the number of times that we hear utterances like "that's with 1 c and 2 m's"), but more data is needed. The particular consonants involved could be playing a role here, too.

Interestingly, this same overall pattern (general preference for degeminating, followed by a noticeable occurrence of "gemination swap") is also discussed by Badecker (1996) for dysgraphic patients. (See reference below)

Finally, it should be noted that some words genuinely do seem to obey the Karttunen generalization, with gemination preferentially switching from C1 to C2. One such word is cinnamon:

cinnamon
C2
m
mm
C1 n
17,600
149,000
nn
2,010,000
645

The question of why cinnamon is different from mayonnaise is left as a matter for future research.


References:

Badecker, William (1996) Representational properties common to phonological and orthographic output systems. Lingua 99, 55-83.

Posted by Adam Albright at June 8, 2004 05:54 AM