Speech Technology - October 2008 - (Page 32) 2 Automatic Speech Recognition Sponsored Content | October 2008 Looking Over the Engine, Checking Under the Hood The key to success in voice applications nowadays, from CRM applications to mobile devices and automotive solutions, is a well designed, natural and truly accurate speech interface, which cannot be realized without a first-rate ASR (Automatic Speech Recognition) to power it. After all, it must be remembered that your voice-enabled service is the face your company presents to your customers, and so the naturalness and user-friendliness of your voice interface is the key to enhancing the customer experience or, if you get it wrong, making it a nightmare. ASR technologies have currently reached a highlevel of maturity, enabling the proliferation of commercial applications onto the market. For more discerning integrators, however, for whom the quality of their solutions and the satisfaction of their customers is central, identifying which products are capable of delivering a high level of performance is not always easy. It requires knowledge and experience to be able to distinguish the best-in-breed products from those that are frankly not up to the job. Accurate, multilingual speech recognition on large-scale vocabularies, while indispensable, is really just the starting point. The main requirements for a high quality ASR are: speaker independence, enabling the recognition of continuous speech from any speaker, without prior training; high accuracy, along with the capability to return a set of Nbest hypotheses together with reliable confidence values, which is key to building a good dialogue-flow management; and the capability to support both grammar-based applications and the use of Statistical Language Models for more complex interactions. support for AURORA DSR (for distributed speech recognition), all ensure customer investments are future-proof. A highly accurate phonetic transcriber is also fundamental since it enables better recognition results. Loquendo ASR is based on the same phonetic transcriber as Loquendo TTS, whose accuracy is tested both automatically and by means of very thorough human listening. Loquendo Speaker Verification is an extension to the ASR module, and it enables more accurate verification by combining both speaker and knowledge verification (i.e. by matching ‘who said it’ with ‘what was said’). ASR tuning (e.g. to the environment, to the speaker) and the ability to learn from the field are key factors for success (or failure), determined by the availability of the right tools, such as the Acoustic Model Adaptation Tool or the Phonetic Learning Tool, rather than having to rely on costly professional services. An embedded denoising module significantly improves performance in noisy environments by cleaning up the signal while computing spectral parameters. In addition to providing all the functionalities described above, Loquendo ASR can also perform more specialized tasks: e.g. the ’word spotting’ function - recognizing keywords within audio streams; the ’Garbage rules’ definition, to match arbitrary short spoken sequences not modeled by the grammar (expressions like “Um, Er…”, “Well”, “Let me think”, etc.). This latter approach in particular adds more flexibility to the use of traditional grammars, giving the user a more natural interaction experience. We hope you have found this information useful, and that you’ll now have the courage to open up the hood and take a look inside a speech recognition engine. Of course, do not hesitate to contact us to try out Loquendo ASR for yourself. You will find all the features mentioned above in a technology designed both to give voice to your customers and to help you understand them better than ever. —- Simon Parr ASR: These are the essential requirements for any speech recognition technology, but, if you take a look ‘under the hood’ at how an ASR engine has been engineered, you will soon discover that it is common practice to use either HMM (Hidden Markov Models) or NN (Neural Networks) for the core algorithms. Loquendo ASR actually combines both of these approaches, resulting in high performance speech recognition and increased efficiency with large vocabularies (from several hundred, up to hundreds of thousands of words). The efficiency of the ASR is also fundamental, to reduce hardware infrastructure costs: an ASR with low computational power requirements enables a larger number of recognition channels to run simultaneously. Loquendo ASR has been carefully optimized and is, in fact, so efficient that its core engine can also be used on embedded platforms such as smartphones and navigation devices. Extended Standards Support should also be considered when evaluating an ASR: compliancy with MRCP (for client-server architectures), complete support for grammar standards, such as W3C SRGS and SISR, as it enables optimization for VoiceXML applications; Speech Technology Magazine | www.speechtechmag.com http://www.speechtechmag.com
Table of Contents Feed for the Digital Edition of Speech Technology - October 2008 Speech Technology - October 2008 Contents Editor’s Letter Industry View Inside Outsourcing Interact Keynoter Highlights the Shrinking Technological World Former Hacker Tackles IVR and Biometrics ‘Press 1’ for Caller Thoughts Soundbytes Voice Vote A New Dragon Emerges Overheard/Underheard An Emotional Mess Emotional Intelligence The Case for Call Recording Unified in Care and Communications An Education in E-Learning Guest Column Standards Speech Solutions Voice Value Forward Thinking Speech Technology - October 2008 Speech Technology - October 2008 - Speech Technology - October 2008 (Page Cover1) Speech Technology - October 2008 - Speech Technology - October 2008 (Page Cover2) Speech Technology - October 2008 - Contents (Page 1) Speech Technology - October 2008 - Editor’s Letter (Page 2) Speech Technology - October 2008 - Editor’s Letter (Page 3) Speech Technology - October 2008 - Industry View (Page 4) Speech Technology - October 2008 - Industry View (Page 5) Speech Technology - October 2008 - Inside Outsourcing (Page 6) Speech Technology - October 2008 - Interact (Page 7) Speech Technology - October 2008 - Keynoter Highlights the Shrinking Technological World (Page 8) Speech Technology - October 2008 - ‘Press 1’ for Caller Thoughts (Page 9) Speech Technology - October 2008 - Soundbytes (Page 10) Speech Technology - October 2008 - Voice Vote (Page 11) Speech Technology - October 2008 - A New Dragon Emerges (Page 12) Speech Technology - October 2008 - Overheard/Underheard (Page 13) Speech Technology - October 2008 - An Emotional Mess (Page 14) Speech Technology - October 2008 - An Emotional Mess (Page 15) Speech Technology - October 2008 - An Emotional Mess (Page 16) Speech Technology - October 2008 - An Emotional Mess (Page 17) Speech Technology - October 2008 - An Emotional Mess (Page 18) Speech Technology - October 2008 - An Emotional Mess (Page 19) Speech Technology - October 2008 - Emotional Intelligence (Page 20) Speech Technology - October 2008 - Emotional Intelligence (Page 21) Speech Technology - October 2008 - Emotional Intelligence (Page 22) Speech Technology - October 2008 - Emotional Intelligence (Page 23) Speech Technology - October 2008 - Emotional Intelligence (Page 24) Speech Technology - October 2008 - Emotional Intelligence (Page 25) Speech Technology - October 2008 - The Case for Call Recording (Page 26) Speech Technology - October 2008 - The Case for Call Recording (Page 27) Speech Technology - October 2008 - The Case for Call Recording (Page 28) Speech Technology - October 2008 - The Case for Call Recording (Page 29) Speech Technology - October 2008 - The Case for Call Recording (Page 30) Speech Technology - October 2008 - The Case for Call Recording (Page 31) Speech Technology - October 2008 - The Case for Call Recording (Page 32) Speech Technology - October 2008 - The Case for Call Recording (Page 33) Speech Technology - October 2008 - Unified in Care and Communications (Page 34) Speech Technology - October 2008 - Unified in Care and Communications (Page 35) Speech Technology - October 2008 - An Education in E-Learning (Page 36) Speech Technology - October 2008 - An Education in E-Learning (Page 37) Speech Technology - October 2008 - Guest Column (Page 38) Speech Technology - October 2008 - Guest Column (Page 39) Speech Technology - October 2008 - Standards (Page 40) Speech Technology - October 2008 - Speech Solutions (Page 41) Speech Technology - October 2008 - Voice Value (Page 42) Speech Technology - October 2008 - Voice Value (Page 43) Speech Technology - October 2008 - Forward Thinking (Page 44) Speech Technology - October 2008 - Forward Thinking (Page Cover3) Speech Technology - October 2008 - Forward Thinking (Page Cover4)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.