Language Log: New Open-source speech code from IBM

September 13, 2004

New Open-source speech code from IBM

According to an article by Steve Lohr in today's NYT, IBM is announcing today that it will donate some source code related to speech technology to two open-source software groups. Apache will get some software for dealing with spoken dates, times and locations, and Eclipse will get some "speech editing tools". The NYT article doesn't explain clearly what the software really is and does; in fact, what the article says is somewhat misleading.

There's an item on the Eclipse site today about a so-called " Voice Tools Project", according to which

The Voice Tools Technology Project will focus on Voice Application tools in the JSP/J2EE space, based on W3C standards, so that these standards become dominant in voice application development. ... Initially, Voice Tools will consist of editors for VoiceXML, the XML Form of SRGS (Speech Recognition Grammar Specification), and CCXML (Call Control eXtensible Markup Language). Implementations of other tools that implement W3C voice standards, such as the LexiconML (Pronunciation Markup Language), will be added as the standards solidify and the Voice Tools Eclipse community grows.

The same announcement mentions "committers" from SBC Communications and Voice Genie as well as IBM. As of 1:00 p.m. today, there will be a newsgroup which may have some more information. So far, though, this looks as if it will mainly be interesting to people who want to build interactive voice applications using open-source software for the framework controlling the interaction -- the available options for the component technologies (such as speech recognition and synthesis) are not changed. And if you're looking for open-source software for what you might think the NYT's phrase "speech editing tools" means, try Audacity, WaveSurfer, or Praat.

Here's the IBM press release. It gives a longer list of participants in the Voice Tools Technology Project: "Apptera, AT&T, Audium, Avaya, Cisco, Fluency, Genesys, Kirusa, Loquendo, Motorola, Nortel, Nuance, Openstream, ScanSoft, Siebel, Syntellect, Telisma, TuVox, V-Enable, Viecore, Vocomo, VoiceGenie, Voice Partners, and VoxGeneration".

It also explains that what IBM is donating to Apache is "Reusable Dialog Components (RDCs)". In more detail:

Pre-built speech software components, or "building blocks" that handle basic functions such as date, time, currency, locations (major cities, states, zip codes), RDCs are often-used functions in speech-enabled infrastructure applications. These allow a caller to, for example, book a flight using an auto-agent over the phone. Multiple reusable dialog components can be aggregated to provide higher levels of user functionality.
Developed by IBM Research, RDCs are Java Server Page (JSP) tags that enable dynamic development of voice applications and multimodal user interfaces. JSPs that incorporate RDC tags automatically generate W3C VoiceXML 2.0 at runtime -- providing a standard basis for speech applications. By providing familiar and standards-based programming models, J2EE developers can add voice interaction to Web applications. And by making the RDC framework available to the community, speech components built using it will work together, regardless of the vendor that created them.

So the Apache stuff is also oriented towards establishing standards for Voice I/O in call center applications and the like. There's nothing on the Apache web site yet about this.

Posted by Mark Liberman at September 13, 2004 12:09 PM