December 19, 2003

Computers singing in Barcelona

I've recently been reading some papers by Xavier Serra, Jordi Bonada and other researchers at the IUA (Institut Universitari de l'Audiovisual) at the Universitat Pompeu Fabra. According to its web site, this university was founded in 1990, and named after Pompeu Fabra, a linguist who "laid down the standards of the modern Catalan language". I can't think of another major university named after a linguist, but perhaps someone will inform or remind me.

The papers that I've been reading are about techniques for analysis, modification and resynthesis of sounds in general and speech in particular (note that to see the last link, you may have to press cancel on an annoying little pop-up that wants you to go to their home page and then navigate through four layers to the abstract in question). These are the techniques involved in Yamaha's Vocaloid system for sythesis of the singing voice, due to be released in January. I teach a course in digital signal processing for outsiders (here is last spring's web site), and I'm planning to put together a module on these new techniques for singing synthesis. They're fun to play with as well as interesting and useful, and they're an excellent illustration of many basic DSP concepts and techniques.

I'm withholding judgment on the Yamaha system until I can try it out interactively. Their demos are impressive, but for evaluating speech synthesis, pre-prepared demos are not very meaningful. They're more or less like screen shots of an interactive program -- they tell you something, but not much. The underlying signal processing techniques are very interesting, though, and should in principle be able to support what Yamaha promises. It's just that there are a lot of other steps where problems could arise.

If you're interested in learning more about the fundamentals, there's a (slightly sketchy) tutorial for the (free software) CLAM system from IUA, in which you should look especially at sections 8 and 9 on SMS analysis and synthesis.

Posted by Mark Liberman at December 19, 2003 07:32 AM