November 04, 2005

Dzongkha and Tsong-kha-pa, Voicing and Aspiration

Some readers are understandably having a hard time interpreting George van Driem's explanation for the Chinese confusion between Dzongkha, the name of the national language of Bhutan, and Tsong-kha-pa, the name of the founder of the dGe-lugs-pa school of Buddhism, of which the current head is the Dalai Lama, which I cited in a previous post. I repeat the relevant portion here:

Such confusion could only arise in the minds of speakers of Mandarin Chinese or Tibetan who are not literate in either Tibetan or Dzongkha. Neither Mandarin Chinese nor Tibetan distinguishes phonologically between voiced and voiceless obstruent initials, unlike Dzongkha and, for example, English.

Van Driem's point turns on the distinction between voicing and aspiration. If the vocal folds vibrate during the production of a speech sound it is said to be voiced. Otherwise it is voiceless. When making the transition from a consonant to the following (voiced) vowel, the onset of voicing may occur immediately or it may be delayed by some amount. If voicing is delayed, the voiceless region at the beginning of the vowel is known as aspiration. The aspiration is the puff of air that you can feel if you wet your finger and hold it in front of your mouth when you say pot in English. To experience the contrast between aspiration and its absence, first say pot, then say spot. You'll notice that there is not much of a puff of air in spot but a noticeable one in pot.

If the consonant is truly voiced, the vocal folds will vibrate during the consonant, with the result that the voice onset occurs prior to the end of the consonant. The Voice Onset Time (VOT) is therefore said to be negative. If voicing starts up right at the transition from consonant to vowel, the VOT is 0. If voicing is delayed and there is aspiration, the VOT is positive. The result is that we can talk about voicing and aspiration as aspects of a single dimension of voice onset time.

Different languages divide the VOT continuum up in different ways. Some languages distinguish between voiced consonants, with negative VOT, and voiceless consonants, with zero VOT. These languages have a true voicing contrast. Other languages distinguish a relatively small (but non-negative) VOT from a larger VOT. These languages have an aspiration contrast. And some languages have three categories: negative (voiced), small (voiceless unaspirated), and large (voiceless aspirated).

One language that distinguishes all three categories is Thai. You can see the distinction in the following three images, which show the waveforms and spectrograms of the Thai syllables [tʰa], [ta], and [da]. You can find the audio files here.) In the first image I've highlighted the aspiration region. You can see that there is no voicing (which shows up as energy near the bottom of the frequency range) until the onset of the vowel, but there is a long (70 millisecond) noise segment between the release of the stop closure and the onset of voicing. In the second image there is very little aspiration but no voicing during the stop closure. In the third image you can see voicing during the stop closure as well as some higher frequency noise.

Acoustic analysis of the Thai syllable [tha]

Acoustic analysis of the Thai syllable [ta]

Acoustic analysis of the Thai syllable [da]

Mandarin Chinese has just two series of stops and affricates, one aspirated, the other unaspirated. There is no voicing contrast. You can see this in the spectrograms and waveforms below which show the syllables written pi and bi in pinyin, which phonetically are [pʰi] and [pi]. In the first there is a very long aspiration region; in the second there is no appreciable aspiration nor any voicing prior to the onset of the vowel.

Acoustic analysis of the Chinese syllable [pʰi]

Acoustic analysis of the Thai syllable [pi]

With this background, we can look again at what van Driem is saying. He is saying that to those who see the words Dzongkha and Tsong-kha-pa in print, it is obvious that they are different. Similarly, if one hears these words and can distinguish the voiced [dz] from the voiceless [ts], it is clear that they are different. If, however, one does not know how they are written and is unable to perceive the phonetic distinction, due to speaking a language that has no voicing distinction, then they may sound the same. A speaker of Mandarin Chinese might, he thinks, fail to distinguish these two words because Mandarin has only an aspiration distinction and is therefore not attuned to hear a voicing distinction. For the Mandarin speaker, both [dz] and [ts] fall into the unaspirated category and so do not contrast.

[Incidentally, voice onset time also enters into the explanation for the existence of so many different names for the city of Beijing, which I discussed some time ago.]

