Speech Technology - June 2008 - (Page 44) JIM LARSON FORWARD THINKING The Evolution of IVR Systems The next phase of IVR development centers on multimodality and faster transactions ver the years, IVR technology has evolved in four and video. In addition, several VoiceXML platform vendors major phases: have built upon two VoiceXML elements: Generation 1: Touchtone input and voice output Systems pre , originally used to replay audio files, has been sented prerecorded voice prompts to callers, who responded extended to replay videos and present image files; and by pressing keys on a touchtone phone. While this simple , originally used to capture audio files, has technology was widely deployed, callers complained about been extended to capture video and image files. Although widely available in Europe, 3G communicaform factors (moving the handset between ear and eyes), getting lost in large menus without being able to back out, tions technology is new to the United States. This technology has the bandwidth to support the dynamic and time-consuming traversal of numerous options. Generation 2: Speech input and output Systems resolved uploading and downloading of video and image files. many first-generation problems by supporting the auto- However, 3G devices will frequently use non-3G netmatic recognition of user speech and responding with pre- works, so slow networks will remain a reality for a few recorded verbal messages or dynamically generated years and may hinder applications requiring high volumes messages using synthesized speech. Call routing technology of data, such as video. Generation 4: Multimodal modes of input and multimedia replaced long menu hierarchies. Clever error-handling dialogues helped users overcome confusion when problems output IVRs will support multiple modes of input, including speech and handwriting recognition and keyarose. Callers no longer needed to move board input. Alternative input modes enable handsets between ears and eyes; they simply Developers need one mode to back up another, meaning that if responded by voice. not wait for Generation 3: Speech input and output and speech recognition fails, the user can press VoiceXML 3.0 to create thirdvisual output We are now on the edge of buttons. Users can also select the appropriate generation IVR third-generation IVR systems. IVRs will use input mode (e.g., speech recognition while applications. the small displays available on today’s phones walking and handwriting or key input during and handheld mobile devices in two ways: a business meeting). Other technologies could • Media viewer. Screens will present illustrations, animainclude GPS to identify the mobile device’s location and sention, and video to callers and support more than just sors to detect the device’s orientation. TV applications on a mobile device; they will involve VoiceXML already supports touchtone and speech recogpersonalized interaction with artificial agents. Callers nition. While the World Wide Web Consortium’s Multiwill observe and internalize information using visual modal Working Group is specifying a distributed multimodal components. These visual elements will support a architecture, researchers are investigating extending wide variety of new applications not previously possi- VoiceXML to support handwriting recognition (using the ble for phones, including entertainment (games, video Speech Recognition Language Specification to indicate clips, and shows), training, and shopping applications. grammars describing words) and keyboard input. They are • Scratchpad. Callers will no longer need to wait while ver- also researching ways to detect a user’s emotions based on bal menus are read to them. Instead, they can scan the various biometric techniques. screen and select the appropriate option by speaking or Each IVR generation has generated new types of mobile pressing buttons on the phone’s keypad. A software devices. IVR-G1 and IVR-G2 devices use cell phones with agent will guide callers in the construction of queries. push-button keypads. IVR-G3 devices will also contain small Partial queries will be presented on the display, along display screens. IVR-G4 will likely be a Swiss Army knife with options to complete the construction of the query. device with all types of attachments for specialized functions. In effect, the display extends the callers’ memories— And each IVR generation will enable new and useful appliboth short-term (by displaying partially constructed cations that will help the mobile user to access the Web and queries) and long-term (by presenting options and alter- allow contact to friends and family from anywhere. natives that callers no longer need to memorize). James A. Larson, Ph.D., is co-program chair for the SpeechTEK 2008 Conference, Developers need not wait for VoiceXML 3.0 to create cochair of the World Wide Web Consortium’s Voice Browser Working Group, and author of the home-study guide The VoiceXML Guide (www.vxmlguide.com). third-generation IVR applications. Several voice platform He can be reached at jim@larson-tech.com. vendors have extended VoiceXML 2.1 to include graphics O 44 | Speech Technology JUNE 2008 www.speechtechmag.com http://www.vxmlguide.com http://www.speechtechmag.com
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.