One of the oldest and best established analogies between speech and music is the tendency to slow down at the ends of phrases. This is a natural consequence of the way our motor system performs rapid temporally-complex actions. But it has a perceptual side as well, since it reliably marks the structure of performances; and it can be consciously or culturally modulated, as the existence of the musical instruction ritardando al fine suggests.
In a post last year ("The shape of a spoken phrase", 4/12/2006), I showed some pictures of this effect in conversational speech; and a couple of days ago, I laid out a detailed design for a simple experiment, suitable for use in an introductory phonetics course, to examine the phenomenon in the laboratory ("Design for a class unit on cross-linguistic final lengthening", 11/17/2007).
That experiment consisted of reading 100 7-digit strings, in the style of American telephone numbers, arranged in such a way that each of the 10 digits occurs equally often in each of the 7 positions, and each of the 100 two-digit sequences occurs equally often spanning each adjacent pair of positions.
While watching the first half of an uncompetitive football game, I've segmented my own reading of the list.
Here's what the overall duration pattern looks like:
That is averaged across all 10 digit types. The individual digit types all show the general pattern of final lengthening, including the smaller amount of lengthening at the end of the first sub-phrase, but the details are quite different, case by case:
In particular, the phrase-position effects are superimposed on a large difference in basic duration. These differences are due in part to different numbers of syllables and segments, but also to different intrinsic durations of the vowels and consonants in question. This large effect of intrinsic duration is one of the things that separates speech rhythm from musical rhythm, where such effects (e.g. due to notes or note-sequences that are difficult to perform) do occur, but are suppressed as far as possible. (The implications of this for traditional distinctions between "stress-timed" and "syllable-timed" languages have yet to be entirely straightened out, since there are no reliable tendencies for speakers to adjust their performances even slightly in the direction of the allegedly isochronous intervals.)
If the digit-string data is segmented more finely (e.g. distinguishing the various pieces of six or seven), we could learn some things about the linguistic character of the effect, which is by no means a simple slowing of overall time. There is also something to be learned from repeating the experiment at two different speaking rates; and from comparing individual differences across members of a class, or across languages for which speakers are available.
For those who are tempted to head in that direction, some additional help may be provided by the R script that generated the pictures in this post, and the relevant data files.
[Readers with sharp eyes and good memories may have noticed that at least one minor aspect of these pictures seems inconsistent with the durational shape of 4-word phrases shown for conversational speech in my earlier post:
There, the second of four words was slightly shorter than the first and the third; here, it's slightly longer. My guess is that this is a rhythmic effect, caused by the typically alternating pattern of phrasal stress in telephone numbers. A variation on this experiment would be to look at stress patterns (e.g. ONE two three FOUR five six vs. ONE two THREE four FIVE six) crossed with phrasing (e.g. 12 - 3456 vs. 123 - 456 vs. 1234 - 56).]
[Update -- John Burke writes:
In the film "Class Action," a lawyer cross-examining an elderly witness asks him if he's familiar with several number strings; one of these turns out to be his own phone number, but with the spoken digits grouped unconventionally, rather than in the standard pattern with lengthening of the third digit. I'm fairly sure the conventional division of a phone number into area code, prefix, and individual four-digit number harkens back to the days when the prefix was an "exchange" like "Mission 6" or "Rhinelander 4," a real building in a neighborhood where all the phones with that prefix were located.) The witness doesn't recognize the number, and the lawyer uses this to cast doubt on the accuracy of his memory for other events.
As someone who used to work for the phone company, and who remembers when "area codes" were introduced, I can confirm that the 3+4 division dates to the days when "exchanges" (corresponding to the first two digits) had names. When I was growing up, for example, my family's phone number was HArrison 3-4488. The detailed history is discussed (of course) in the wikpedia entry.]
Posted by Mark Liberman at November 19, 2007 07:10 AM