April 05, 2005

Mmm

In response to a couple of posts by Heidi Harley (and a note by me, and a post by Bob Kennedy over at Phonoloblog), Q. Pheevr presents some actual data on the phonetics of "the sound Marge Simpson makes to express some combination of disapproval, annoyance, and frustration". Q found some audio clips on the web, and made (wideband) spectrograms, which Q admits "don't really tell me a whole lot".

But a "wide-band spectrogram" -- one made with narrow time resolution and thus broad frequency resolution -- is usually not a good way to see what's happening with voice quality. Here's a better display of Q's first example:

The top panel is a pitch track (not at all believable in this case); the middle panel is a "narrow-band spectrogram" (with broad time resolution and thus narrow frequency resolution -- the analysis bandwidth here is about 20 Hz., as opposed to the 200 Hz or so of Q's spectrograms); the bottom panel is the audio waveform.

As you can see clearly in the waveform, there are three basic parts of the sound. The first part is short (about 100 msec), high-falling in pitch (about 350 Hz. to 275 Hz.), and relatively "pure" in voice quality. After a brief transitional segment of period doubling, the second part is longer (about 300 msec), lower in amplitude and slightly rising in pitch (about 70 Hz to 90 Hz.), with a fair amount of "shimmer" (period-to-period amplitude variation). The fundamental is completely missing -- I suspect that this is due to the recording or some other aspect of the audio processing, though there might really be a nasal or voice-quality-related zero canceling the fundamental. The third part is the longest (about 430 msec) and the loudest. The glottal oscillation has become extremely variable, both in amplitude and in period, verging on what would be called "vocal fry" if there were fewer short-period components. I'd guess that there are several different modes of glottal oscillation going on at once, and the whole system is on the edge of chaos (probably in the technical sense of the word). The transition from the second to the third segment of the groan certainly involves some increased subglottal pressure, but there is probably a laryngeal-pharyngeal gesture as well, such as constriction of the false vocal folds and/or vertical tension on the larynx implemented by the strap muscles.

Q nevertheless gives what I think is a pretty good description:

Anyway, I'd describe the sound as a possibly creaky-voiced bilabial nasal with a very narrow somethingo-pharyngeal secondary articulation and falling tone. I might be able to figure it out better if I could make the sound reliably myself. Heidi can; she writes:

Heck, I can make that annoyed noise too, distinguishing it from the yummy mmm noise by doing some trick with my pharynx/tongue root/larynx, together with the other normal features associated with bilabial nasals, and I don't have anything at all like Kavner's distinctive vocal apparatus.

But when I try to do it, sometimes it comes out sounding like Marge, but sometimes it sounds more like a wounded muskrat or a sexually frustrated wookiee. I also don't have a readily available corpus of non-annoyed Marge sounds with which to compare the samples above. (Marge Simpson has a rough life, you know; there's a lot for her to be annoyed about.) But perhaps these notes will inspire someone else to improve upon my description.

I'll wait to say more until I've seen more data. My Simpsons corpus has just arrived from amazon.com -- now all I need is some free time to do the research!

(Of course, acoustic analysis can only tell us so much. We really need to study Julie Kavner's speech production -- but then Heidi says she is fluent in Marge-ese, so she would be just as good a subject. What Marge is doing in the third segment of the utterance analyzed above might be something that is called "Dysphonia plicae ventricularis" when someone can't help doing it all the time:

Typically, patients with dysphonia plicae ventricularis (also called false vocal fold phonation or ventricular dysphonia) demonstrate a low-pitched, coarse or rough, monotone voice. The voice may have a breathy quality. Usually, hyperadduction of both the true and false vocal folds is present. Because the ventricular folds have difficulty in making a good firm approximation along their entire length, severe hoarseness and breathiness often result. Vocal fold scarring may be mistaken for this disorder and must be ruled out.

This disorder is frequently responsive to voice therapy that focuses on gestures such as gargling and sighing, which relax supraglottic muscles and isolate true vocal fold adduction from false vocal fold adduction.

Whatever is going on can probably be seen using a fiberoptic laryngoscope.)

Posted by Mark Liberman at April 5, 2005 05:05 PM