Speech Technology - October 2008 - (Page 40) DEBORAH DAHL STANDARDS Opening the World of Multimodality Standards can help bring more applications to bear these approaches would use the MMI Architecture, but with different implementations of components. Being able to ask for information by speaking, and then seeTo make the MMI Architecture more concrete, the W3C ing the results on a screen, is a powerful paradigm for inter- Multimodal Interaction Working Group recently published a acting with a mobile device. On the one hand, speech input document on authoring called “Authoring Applications for avoids annoyances inherent in graphical user interfaces the Multimodal Architecture” (www.w3.org/TR/mmi-auth). (GUIs), such as scrolling through long lists and torturous key- This specification provides one example of how to create pad entry. On the other hand, GUI output avoids some of the multimodal applications using the MMI Architecture. The annoyances inherent in voice interfaces, such as the problem sample application, a simple online ordering application, is of conveying large amounts of information by voice. based on current standard technologies, including SCXML, Today’s multimodal voice search applications are avail- HTML, and VoiceXML. In the example described in this docable from companies like Tellme Networks, Google, and ument, the Interaction Manager is implemented with vlingo. But these are proprietary applications. What if SCXML, the graphical modality is implemented with HTML, third-party developers could create similar applications and the voice modality is implemented with VoiceXML. The just as easily as millions of developers today can create SCXML Interaction Manager sends messages to the Web Web pages? browser and voice browser telling them to We are now The key to opening up multimodal applispeak or display information. The Web coming a long cations to a wider developer base is stanbrowser and voice browser interact with the way toward dards. Like HTML for the Web and user and deliver the user’s input back to the opening up VoiceXML in the voice world, upcoming Interaction Manager, which then moves on multimodal multimodal standards will enable many more to the next step in the application. applications to a developers to create multimodal applications. This design is very good for supporting wide range of My column, “A Framework for Multimodal distributed applications. However, today’s developers. Apps” (July/August 2008) covered the MultiWeb and voice browsers have no direct way modal Interaction (MMI) Architecture being developed by the of receiving messages from an external component like the World Wide Web Consortium. This architecture defines an Interaction Manager. Both types of browsers have been Interaction Manager that coordinates components to support designed to take charge of the interaction with the user. This speech, graphics, pointing, and other modalities. It has some makes sense in a unimodal voice or Web application, but very attractive features, such as a natural extensibility to new multimodality by its nature requires coordination across modalities and excellent support for distributed applications. modalities. By introducing the Interaction Manager for this This ability to be distributed supports applications using coordination, another component, not the HTML or widely dispersed components. Distributed components could VoiceXML application, interacts with the user. be used to create modality mashups, where modality compoWhile future versions of HTML and VoiceXML might nents are developed by third parties with specialized expert- respond to external control messages, even with today’s ise and are then integrated into new applications using the browsers techniques exist that can be used to simulate the MMI Architecture. ability to receive messages in straightforward ways. These However, the architecture by itself isn’t enough to create are illustrated in detail in the authoring document. applications. We need specific software, such as Web With the principles of the MMI Architecture and specific browsers and voice browsers, and markup languages like examples illustrated in the authoring document, we are VoiceXML and HTML, to actually build applications. A now coming a long way toward opening up multimodal Web server running an Interaction Manager could commu- applications to a wide range of developers. This will, in nicate with distributed modality components over the turn, accelerate the development of a whole new world of Web, such as a remote speech recognizer or text-to-speech multimodal applications. engine. In this case, the components would communicate Deborah Dahl, Ph.D., is the principal at speech and language technology using HTTP, the standard Web communication protocol. consulting firm Conversational Technologies and chair of the World Wide Web Alternatively, one or more modality components or the Consortium’s Multimodal Interaction Working Group. She can be reached at dahl@conversational-technologies.com. interaction manager might run locally on a device. Both of to see some very exciting multimodal W e are startingespecially in the area of voice search. applications, 40 | Speech Technology OCTOBER 2008 www.speechtechmag.com http://www.w3.org/TR/mmi-auth http://www.speechtechmag.com
Table of Contents Feed for the Digital Edition of Speech Technology - October 2008 Speech Technology - October 2008 Contents Editor’s Letter Industry View Inside Outsourcing Interact Keynoter Highlights the Shrinking Technological World Former Hacker Tackles IVR and Biometrics ‘Press 1’ for Caller Thoughts Soundbytes Voice Vote A New Dragon Emerges Overheard/Underheard An Emotional Mess Emotional Intelligence The Case for Call Recording Unified in Care and Communications An Education in E-Learning Guest Column Standards Speech Solutions Voice Value Forward Thinking Speech Technology - October 2008 Speech Technology - October 2008 - Speech Technology - October 2008 (Page Cover1) Speech Technology - October 2008 - Speech Technology - October 2008 (Page Cover2) Speech Technology - October 2008 - Contents (Page 1) Speech Technology - October 2008 - Editor’s Letter (Page 2) Speech Technology - October 2008 - Editor’s Letter (Page 3) Speech Technology - October 2008 - Industry View (Page 4) Speech Technology - October 2008 - Industry View (Page 5) Speech Technology - October 2008 - Inside Outsourcing (Page 6) Speech Technology - October 2008 - Interact (Page 7) Speech Technology - October 2008 - Keynoter Highlights the Shrinking Technological World (Page 8) Speech Technology - October 2008 - ‘Press 1’ for Caller Thoughts (Page 9) Speech Technology - October 2008 - Soundbytes (Page 10) Speech Technology - October 2008 - Voice Vote (Page 11) Speech Technology - October 2008 - A New Dragon Emerges (Page 12) Speech Technology - October 2008 - Overheard/Underheard (Page 13) Speech Technology - October 2008 - An Emotional Mess (Page 14) Speech Technology - October 2008 - An Emotional Mess (Page 15) Speech Technology - October 2008 - An Emotional Mess (Page 16) Speech Technology - October 2008 - An Emotional Mess (Page 17) Speech Technology - October 2008 - An Emotional Mess (Page 18) Speech Technology - October 2008 - An Emotional Mess (Page 19) Speech Technology - October 2008 - Emotional Intelligence (Page 20) Speech Technology - October 2008 - Emotional Intelligence (Page 21) Speech Technology - October 2008 - Emotional Intelligence (Page 22) Speech Technology - October 2008 - Emotional Intelligence (Page 23) Speech Technology - October 2008 - Emotional Intelligence (Page 24) Speech Technology - October 2008 - Emotional Intelligence (Page 25) Speech Technology - October 2008 - The Case for Call Recording (Page 26) Speech Technology - October 2008 - The Case for Call Recording (Page 27) Speech Technology - October 2008 - The Case for Call Recording (Page 28) Speech Technology - October 2008 - The Case for Call Recording (Page 29) Speech Technology - October 2008 - The Case for Call Recording (Page 30) Speech Technology - October 2008 - The Case for Call Recording (Page 31) Speech Technology - October 2008 - The Case for Call Recording (Page 32) Speech Technology - October 2008 - The Case for Call Recording (Page 33) Speech Technology - October 2008 - Unified in Care and Communications (Page 34) Speech Technology - October 2008 - Unified in Care and Communications (Page 35) Speech Technology - October 2008 - An Education in E-Learning (Page 36) Speech Technology - October 2008 - An Education in E-Learning (Page 37) Speech Technology - October 2008 - Guest Column (Page 38) Speech Technology - October 2008 - Guest Column (Page 39) Speech Technology - October 2008 - Standards (Page 40) Speech Technology - October 2008 - Speech Solutions (Page 41) Speech Technology - October 2008 - Voice Value (Page 42) Speech Technology - October 2008 - Voice Value (Page 43) Speech Technology - October 2008 - Forward Thinking (Page 44) Speech Technology - October 2008 - Forward Thinking (Page Cover3) Speech Technology - October 2008 - Forward Thinking (Page Cover4)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.