June 27, 2007

Problems with the Wikipedia Logo

There has been quite a bit of controversy over the reliability of Wikipedia. Now, adding insult to injury, people are complaining about errors in the Wikipedia logo, and this has been reported in The New York Times. The reason that it is possible to talk about errors in a logo is that the Wikipedia logo incorporates the word "Wikipedia" written in a variety of writing systems. Some of the spellings are wrong.

One of the errors that has elicited complaints is in the Hindi spelling. Hindi has no /w/, so the consonant /v/ is used. The problem is that in the Devanagari writing system the consonants are written with stand-alone symbols but vowels immediately preceded by a consonant within the same syllable are written with diacritics that attach to the phonologically preceding consonant in various positions. Here, for example, are /va:/, /vi:/, /vu/, and /vu:/. वा वी वु वू. In the first two, the vowel follows /v/. In the second two, the vowel diacritic goes underneath. The short /i/ diacritic actually precedes the consonant that it goes with. In Unicode, U+0935 DEVANAGARI LETTER VA precedes U+093F DEVANAGARI VOWEL SIGN I, but the combination is to be rendered with the /i/ part preceding the /v/ part. Here is the Unicode sequence: वि. If you see a vertical bar to the right with a tail coming off the top and continuing to the right, your browser, like many other rendering engines, is not rendering this properly. Here is what /vi/ should look like:

Devanagari /vi/

Here is what it looks like when misrendered, as in the Wikipedia logo:

Misrendered Devanagari /vi/

The other problem that has attracted attention is of a different nature. It concerns the Japanese spelling of Wikipedia. What we see in the logo begins like this: ワィ. The first character is the katakana symbol for /wa/. The second symbol is the subscript version of the vowel /i/. The proposed change is to ウィ. Here the second character is the subscript /i/ as before, but the first character is the vowel /u/. What is going on?

The problem is that the sound system of Japanese does not permit any vowel other than /a/ to follow the consonant /w/. We can have /wa/, but not */wi/, */we/, */wo/, or */wu/. Where the morphology creates such sequences, the /w/ is deleted. That is why we have alternations like /kau/ "buys" and /kao:/ "let's buy" with /kawanai/ "does not buy". The stem of the verb "to buy" is /kaw/, but the /w/ disappears before suffixes that begin with vowels other than /a/. Since Japanese does not permit the sequence of /w/ followed by any vowel other than /a/, there is a kana letter for /wa/ but not for the other sequences.

The reason that there are characters for sequences of a consonant and a vowel is that the two phonological writing systems of Japanese, hiragana and katakana, are moraic writing systems. That is, they are not based on a segmentation of the utterance into individual sound segments but rather into the units known to phonologists as moras. To a first approximation, a mora is the thing of which a light syllable has one and a heavy syllable has two. For example, a Japanese syllable like /ho/ consists of one mora while both /ho:/, with a long vowel, and /hon/, with a final nasal, each consist of two moras. The basic rule in Japanese is that there is one kana symbol per mora. Thus, /ho/, /ho:/, and /hon/ are written ホ, ホー, and ホン respectively. (Since the Wikipedia logo is in katakana, I will limit myself to katakana here. hiragana is structurally almost the same.) The second symbol in /ho:/ marks a long vowel, while the second symbol in /hon/ marks a syllable-final nasal.

A consonant followed by a short vowel or the first half of a long vowel or diphthong constitutes a single mora and so is written with a single, unanalyzable character. Thus, in the set: カ /ka/, キ /ki/, ク /ku/, ケ /ke/, コ /ko/ it is impossible to identify a part that represents /k/ and parts that represent /a/, /i/, /u/, /e/, and /o/. One consequence of using a writing system of this type is that you can't necessarily write any combination of consonants and vowels that occur in the language: a separate character must be constructed for each mora, and in particular, for each CV pair.

The restrictions on /w/-vowel sequences are a fairly recent historical innovation. A few centuries ago, Japanese allowed /w/ before every vowel but /u/. Naturally, there were kana characters for these other sequences: ヰ /wi/ ヱ /we/ ヲ /wo/. The first two are no longer used at all. The last is used, but in effect as a morphogram, to write the accusative case marker, which is just /o/.

The problem posed by Wikipedia is then that the phonology of Japanese does not permit the sequence /wi/ and so provides no direct method of writing it. Conservative speakers of Japanese change the /wi/ of foreign words to the disyllabic sequence /ui/, that is, first an /u/, then an /i/. Less conservative speakers familiar with languages like English that have /wi/ may actually pronounce the sequence as a single syllable, but they are still confronted by the problem of how to write it. The traditional approach is to write the sequence using two vowel symbols, as if it were disyllabic. Naively, that would result in ウイ. In fact, I have seen spellings such as this. What the Japanese Wikipedians prefer, however, is ウィ, in which the small subscript version of the /i/ is used. This makes it clear the they really mean /wi/.

What of the erroneous spelling ワィ in the current Wikipedia logo? It looks like this is an attempt to use a different mechanism for writing CV sequences for which no kana letter exists. Another Japanese consonant that is restricted in its combination with following vowels is /f/, which, like /w/, occurs only when followed by a single vowel, in this case /u/. The only kana letter for an /fV/ sequence is therefore フ /fu/. However, foreign words with other vowels following /f/ have long been familiar to Japanese people, e.g. "film". These are written with フ followed by the small subscript version of the vowel, e.g. フィ /fi/, as フィルム /firumu/ "film". ワィ is the result of applying the same principle to /w/.

Correction: The original post contained a garbled sentenced about the status of /f/, which I have corrected. The original stated that /f/, like /w/, occurs only before /a/. Actually, /f/, like /w/, occurs only before one vowel, but in the case of /f/ it is /u/, not /a/.

Posted by Bill Poser at June 27, 2007 03:11 AM