November 01, 2003

Earwitnesses, voiceprints, automatic speaker recognition

There's an interesting article in Legal Affairs about the history and current legal status of earwitness testimony, "expert" testimony on "voiceprints", and speaker recognition technology. [Via Arts and Letters Daily].

Here is the home page for Speaker Recognition Evaluations at the National Institute of Standards and Technologies (NIST), where you can find some information about NIST's speaker recognition tests. A little poking around will find you this 2002 paper by Mark Przybocki and Alvin Martin, summarizing the history. Their figure 10 is reproduced below, showing the results of the 2001 evaluation with an operating point of about 1% false alarms (i.e. the system says "that's the one" when it isn't) and around 20% misses (i.e. the systems says "that's not the one" when it is). I think this is better than humans could do on this type of task (where there are hundreds of unknown speakers and the speech is recorded over the phone), though I don't know that there are exactly comparable human benchmarks.

This kind of plot, showing the trade-off between misses and false alarms as a detection system's threshold is varied, is called a DET curve. This is a paper by some NIST researchers explaining the nature and value of DET curves.

[Update 11/2/2003: Like many people, I've personally experienced some extraordinary feats of speaker recognition. A few years ago, for example, when I answered the phone and someone said "Hello, is this Mark?", I instantly recognized the voice as someone I had known well in college, but had been out of touch with ever since. I had no reason to expect them to call me, and in fact I hadn't thought of them in years.

Cases like this always seem to involve friends or at least people I once spent a lot of time with, just like the similar cases of face recognition, which are commoner and therefore less surprising. And I've also had some embarassing misses, where I meet someone on the street whom I once knew well, and they say "don't you remember me?", and I don't. Or phone calls where I fail to recognize the voice of a current acquaintance.

There have also been a few false alarms, though not many and never very lasting -- cases where I hear someone talking in a public space and for a minute I think I recognize their voice, but it turns out that they're no one that I ever knew. And I'm sure I'd be really bad at the kind of test the NIST folks are running, where a few seconds of speech from an unknown test speaker has to be implicitly compared with hundreds of equally unfamiliar reference speakers.

Still, the occasional extraordinary anecdote does give me the intuitive though perhaps irrational belief that somehow the identity is in there. This kind of feeling may be responsible for the credence the law seems to give to earwitness testimony. Along with uncritical respect for technology -- especially technology with complicated pictures -- this belief also may have helped make "voiceprints" seem plausible.]

Posted by Mark Liberman at November 1, 2003 12:19 PM