Speech Technology - October 2008 - (Page 17) COVER STORY Companies to Call Sponsored Content So far the industry has made significant progress in its quest to deliver better speech systems, but these improvements have come in a limited context. Because the problems associated with emotive speech are so intimidating, most speech scientists have focused (at least to date) on dealing with “normal speech,” that is, speech that does not display any of the previously mentioned variabilities. The end result is that most present speech synthesis systems do not exhibit emotion and instead produce bland, neutral, machine-like speech. Vendors have been trying to add emotions to speech systems in a couple of ways, each with varying levels of success. “The industry has made more progress focusing on the linguistic features found when expressing different emotions than with examining the acoustical elements,” admits Dan Faulkner, director of product management and offer marketing at Nuance Communications. The linguistic approach focuses on word choices when responding to a user. The phrase I can see you are having trouble may be used to evoke a sense of empathy if a user seems to be stuck on a step in the call process. Adding Features Vendors have spent a great deal of time, money, and effort trying to determine how different words impact customer exchanges. They have found that certain words, such as “just” and “simply,” trigger various responses in customers. A few suppliers have also been trying to take that knowledge and use it to improve their systems’ effectiveness. Loquendo has focused on adding emotional features to its text-to-speech systems. Patrizia Pautasso, marketing and business development manager at Loquendo, views the company’s work not as pioneering new technology but as an extension of the basic ideas of concatenative synthesis—the extraction of segments of true human speech and playing them in different combinations. Rather than concentrating on having short phoneme sequences evoke certain feelings, the vendor has focused on using entire phrases that have expressive power. Certain phrases are chosen to represent “speech acts” (i.e., common linguistic expressions with a strong pragmatic and social intention, such as greetings, requests, thanks, approvals, and apologies). Loquendo’s Expressive Cues feature, which works in multiple languages, provides a series of commonly used expressions said by Loquendo’s voices in an expressive way. Emotional Bonds The more difficult task centers on trying to acoustically connect emotions, such as anger, happiness, and sadness, to synthetic speech systems dynamically. At the moment, delivery of such products is theoretical rather than practical, and no vendor offers a system capable of generating dynamic, real-time emotive speech output. Here, vendors need to figure out how variations in speech patterns correspond to different emotions. Three areas present significant challenges in adding emotion to speech systems: intonation, voice quality, and interaction variability. Intonation centers on the placement of word-level and utterance-level accents. A lot of work has been done on the description of intonation contours, and some rules have been produced for assigning contours to synthetic speech based on parsing its verbal content. While current systems have made progress in this respect, the limitations of word parsing and intonation rules mean that no system can correctly assign the correct contour for every possible utterance a person could make. The underlying “personality” of a synthesized voice is a major contributor to whether it sounds natural. Systems based on prerecorded speech perform well in this respect because the speaker’s voice quality comes through in the resynthesized speech; however, this option is not available in all cases. Machine voice output has been improving, but still falls short of the vocal granularity found in human speech. Also, vendors do not want to simply generate an emotive response; they need www.speechtechmag.com http://www.loquendo.com http://www.loquendo.com http://www.speechtechmag.com
Table of Contents Feed for the Digital Edition of Speech Technology - October 2008 Speech Technology - October 2008 Contents Editor’s Letter Industry View Inside Outsourcing Interact Keynoter Highlights the Shrinking Technological World Former Hacker Tackles IVR and Biometrics ‘Press 1’ for Caller Thoughts Soundbytes Voice Vote A New Dragon Emerges Overheard/Underheard An Emotional Mess Emotional Intelligence The Case for Call Recording Unified in Care and Communications An Education in E-Learning Guest Column Standards Speech Solutions Voice Value Forward Thinking Speech Technology - October 2008 Speech Technology - October 2008 - Speech Technology - October 2008 (Page Cover1) Speech Technology - October 2008 - Speech Technology - October 2008 (Page Cover2) Speech Technology - October 2008 - Contents (Page 1) Speech Technology - October 2008 - Editor’s Letter (Page 2) Speech Technology - October 2008 - Editor’s Letter (Page 3) Speech Technology - October 2008 - Industry View (Page 4) Speech Technology - October 2008 - Industry View (Page 5) Speech Technology - October 2008 - Inside Outsourcing (Page 6) Speech Technology - October 2008 - Interact (Page 7) Speech Technology - October 2008 - Keynoter Highlights the Shrinking Technological World (Page 8) Speech Technology - October 2008 - ‘Press 1’ for Caller Thoughts (Page 9) Speech Technology - October 2008 - Soundbytes (Page 10) Speech Technology - October 2008 - Voice Vote (Page 11) Speech Technology - October 2008 - A New Dragon Emerges (Page 12) Speech Technology - October 2008 - Overheard/Underheard (Page 13) Speech Technology - October 2008 - An Emotional Mess (Page 14) Speech Technology - October 2008 - An Emotional Mess (Page 15) Speech Technology - October 2008 - An Emotional Mess (Page 16) Speech Technology - October 2008 - An Emotional Mess (Page 17) Speech Technology - October 2008 - An Emotional Mess (Page 18) Speech Technology - October 2008 - An Emotional Mess (Page 19) Speech Technology - October 2008 - Emotional Intelligence (Page 20) Speech Technology - October 2008 - Emotional Intelligence (Page 21) Speech Technology - October 2008 - Emotional Intelligence (Page 22) Speech Technology - October 2008 - Emotional Intelligence (Page 23) Speech Technology - October 2008 - Emotional Intelligence (Page 24) Speech Technology - October 2008 - Emotional Intelligence (Page 25) Speech Technology - October 2008 - The Case for Call Recording (Page 26) Speech Technology - October 2008 - The Case for Call Recording (Page 27) Speech Technology - October 2008 - The Case for Call Recording (Page 28) Speech Technology - October 2008 - The Case for Call Recording (Page 29) Speech Technology - October 2008 - The Case for Call Recording (Page 30) Speech Technology - October 2008 - The Case for Call Recording (Page 31) Speech Technology - October 2008 - The Case for Call Recording (Page 32) Speech Technology - October 2008 - The Case for Call Recording (Page 33) Speech Technology - October 2008 - Unified in Care and Communications (Page 34) Speech Technology - October 2008 - Unified in Care and Communications (Page 35) Speech Technology - October 2008 - An Education in E-Learning (Page 36) Speech Technology - October 2008 - An Education in E-Learning (Page 37) Speech Technology - October 2008 - Guest Column (Page 38) Speech Technology - October 2008 - Guest Column (Page 39) Speech Technology - October 2008 - Standards (Page 40) Speech Technology - October 2008 - Speech Solutions (Page 41) Speech Technology - October 2008 - Voice Value (Page 42) Speech Technology - October 2008 - Voice Value (Page 43) Speech Technology - October 2008 - Forward Thinking (Page 44) Speech Technology - October 2008 - Forward Thinking (Page Cover3) Speech Technology - October 2008 - Forward Thinking (Page Cover4)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.