Merry Christmas to our readers! Some seasonally-appropriate reading from past editions:
"Same-sex Mrs. Santa: 'The semantics are confusing'" 11/27/2003
"'Twas the night before Christmas" 11/24/2003
"A 'Boxing Day Election' -- or not?" 12/5/2004
"Talking animals: Miracle or curse?" 12/24/2004
"Homo Hemingwayensis" 1/9/2005
"For linguists only" 2/4/2005
"Christmas trees and holiday trees" 12/2/2005
"Negation, over- and under-" 12/21/2005
"L(a)ying snow" 12/24/2005
"Zogby: Bill O'Reilly's bitches?" 12/22/2006
In other holiday news, a new survey by Language Log labs has found that Hanukkah is second only to Muammar al-Gaddafi in public spelling uncertainty.
We learned of this problem by data-mining the web. Ignoring case, here are some of the counts:
hanukkah |
hanukah |
hannukah |
hannukkah |
hanukka |
hanuka |
hannuka |
hannukka |
|
24,100,000 |
1,160,000 |
1,430,000 |
85,200 |
194,000 |
957,000 |
125,000 |
9540 |
|
Yahoo | 55,900,000 |
56,600,000 |
57,200,00 |
71,200 |
33,600,000 |
55,000,000 |
126,000 |
2,010 |
MSN | 2,097,292 |
537,348 |
159,167 |
12,823 |
21,469 |
39,290 |
9,031 |
1,352 |
chanukkah |
chanukah |
channukah |
channukkah |
chanukka |
chanuka |
channuka |
channukka |
|
461,000 |
5,380,000 |
560,000 |
975 |
359,000 |
835,000 |
3,040 |
697 |
|
Yahoo | 33,800,000 |
38,600,000 |
33,900,000 |
1,750 |
291,000 |
33,200,000 |
35,200 |
1,320 |
MSN | 56,078 |
767,919 |
46,153 |
638 |
24,662 |
52,053 |
4,282 |
577 |
(Note that Yahoo is almost certainly doing some curious sort of "query expansion".)
The orthographic background of this problem is discussed in the wikipedia article, from which I learned about Khanike, the "YIVO standard transliteration from the Yiddish and/or Ashkenazic pronunciation of the Hebrew", which has 10,300 Google hits; and also about Robert Siegel's entertaining and informative exploration of the issues on All Things Considered last year.
In our survey results, 31.2% of the American public claimed to know how to spell Hanukkah, while 63.4% said they had no clue, and 5.4% responded that "it's people like you who are ruining Christmas". When we asked those who claimed to know the spelling what it actually is, we got 11 different versions from the 15 people who actually made it though to the end of word. A typical response from the others: "Hey, man, what is this, fifth grade?"
For those who care about such things, the entropy of the MSN distribution is almost exactly 2 bits, corresponding to the amount of uncertainty in four equally likely alternatives.
[Several readers have pointed out that it's strange that the only consistent part in the many common spellings of this word is the vowel sequence 'a u a', which is also the only part that isn't specified by the Hebrew orthography (heth nun vav kaf hey). Others have pointed out that this is completely expected, given the first letter-name itself has the common variants Ḥet, H̱et Khet, Kheth, Chet, Cheth, Het, and Heth. And then there are those who have pointed to additional variants in which the vowels are also altered. like "Hanakah" (13,200 Google hits). Well, as Don Rumsfeld said about the looting of Baghdad, "Freedom's untidy".]
Posted by Mark Liberman at December 24, 2006 08:22 AM