November 17, 2007

Design for a class unit on cross-linguistic final lengthening

Anya Lunden asked:

In a recent LL post ("The perils of mixing romance with language learning", 11/7/2007) you describe a very interesting class project on final lengthening cross-linguistically, and, even more enticingly, a class *unit* on FL. I would be very grateful for any references on cross-linguistic FL you might happen to have handy. I'm particularly interested in FL at the word level, and although this phonetic effect is established and appears to be cross-linguistic, I have been able to find relatively little data on it.

There are two questions here. One is about the literature on cross-linguistic final lengthening --  I suspect that Anya knows more about that than I do, but I'll send her a few thoughts separately. The second question is about a class project on final lengthening, suitable for comparing the size of the effect across sounds and individuals and languages.

This is something that I did about a dozen years ago, as a lab project in an introductory phonetics course. I'm afraid that the materials from that course are buried in old boxes and back-up tapes somewhere, if they still exist at all. But for my second Breakfast Experiment™ this morning, I'll try to reconstruct the recipe for the lab exercise, and show how quick and easy it is to put into effect. On some other morning, I'll take a shot at explaining why the question is interesting, for those of you who aren't already disposed to believe that it is.

The idea is to look at how the duration of a word (or syllable, or phonetic segment) is modulated by its phrasal position. In general, there are several kinds of barriers to doing this. For one thing, different segments and syllables and syllable-sequences have very different intrinsic durations, which may be modulated in different ways by emphasis and speaking rate and phrasing and so on. And there are many different sorts of phrases and phrasal relations, which may have different effects on speech timing. And rhetorical structures, and word frequencies, and the various vagaries and artistries of performance all have their important consequences. And then there are all the differences among languages. Putting it all together, an experiment to compare final lengthening across languages can be hard to design and interpret.

One way to deal with these difficulties is to look at a great deal of data, and hope that all the complexities balance out somehow. Jiahong Yuan, Chris Cieri and I took this approach, in some research discussed in an earlier Language Log post ("The shape of a spoken phrase", 5/12/2006) and published in part as "Towards an Integrated Understanding of Speaking Rate in Conversation", ICASSP 2006.

Another approach is to ask speakers to read (or repeat) phrases that have been artificially designed to put a designated set of elements in a carefully balanced and controlled set of positions in a limited set of phrases. An especially easy way to do this -- and one that is well adapted to use as a lab project in an introductory phonetics course -- is to look at structured sequences of elements like numbers and letters, such as telephone numbers, catalog identifiers and the like. These are structurally simple, semantically "flat", and rhetorically neutral, so that every element can freely occur in every structural position, and each such sequence of elements is open to about the same range of rhetorical and emotional interpretations as every other one. And such sequences translate trivially into other languages, at least those that have names for the digits or other elements that you use.

I originally tried this back around 1980, when I was at Bell Labs, as a simple and crude way to estimate the appropriate durational modifications for a speech synthesis system. It was the phone company, after all, and showing that we could do a decent job on telephone numbers made sense! But it's simple and quick enough to make a good lab project for a phonetics course. Each subject's recording takes only five or ten minutes to make and a couple of hours to measure; the students learn a fair amount about what speech sounds look like and how they interact; and the resulting data is intricately regular, offering plenty of fun statistical modeling.

Here's a bit of fancy footwork in R that does the right thing to create a balanced list of 7-digit (American-style) telephone numbers. You probably don't want to make your students understand how this works, or even to show it them at all -- you just need to use it (or something equivalent) to create the patterns that they'll use in the experiment.

X <- matrix(nrow=100,ncol=7)
X[,1] <- sample(rep(0:9,10),100)
for(c in 2:7){
   for(p in 0:9){
      X[X[,c-1]==p,c] <- sample(0:9,10)
   }
}
write.table(X, file="Sequence1", row.names=F, col.names=F)

(You could create similar collections of e.g. 10-digit numbers by changing 7 to 10 in the first and third lines.) The result is 100 rows of seven digits each, with the property that each of the ten digits occurs equally often in each column, and each of the 100 possible pairs of digits occurs equally often spanning each pair of columns. The first five rows are here:

2 6 2 2 8 0 2
6 4 4 1 9 0 3
3 7 7 7 7 3 3
1 7 2 7 6 7 2
2 4 3 2 1 2 0

The whole output of this particular run is here. Of course, your results would be randomly different, since each run will have different pseudorandom permutations at each choice point.

It's a good idea to format the strings so as to signal the grouping that you want speakers to use. A simple way to do it in this case, using gnu awk, might be

gawk '{printf("(%d)  %d%d%d - %d%d%d%d\n",NR,$1,$2,$3,$4,$5,$6,$7)}' sequence1.txt >sequence1a.txt

The results would start this way:

(1) 262 - 2802
(2) 644 - 1903
(3) 377 - 7733
(4) 172 - 7672
(5) 243 - 2120

The whole output in this format is here.

And if we wrap it with a bit more formatting, we can make an xhtml side show, with some initial instructions on the first slide, and then each digit string on its own page, so that students (or the friends and acquaintances that they recruit) can easily keep track of where they are in the sequence of strings. (Someday, browsers will have advanced to the point where you could actually record each string via a javascript call, or the like, and perhaps even do automatic -- and accurate -- phonetic-segment alignment.)

I recorded the list -- it took me about seven and a half minutes. I'll post the measurements when I have a chance to do them -- segmenting this much stuff by hand takes a couple of hours, which is too much labor for one Breakfast Experiment™, especially on a morning when I'm doing two of them.

Posted by Mark Liberman at November 17, 2007 08:57 AM