Speech Technology - June 2008 - (Page 18) COVER STORY information are better conveyed in certain formats. An obvious example: The ideal form for musical information is audio, whereas maps and location-based services are ideally visual. Thus, multimodal interface designers need to be particularly canny when delivering functionality on a given application. “Functions will not map one-toone between a phone and a multimodal interface,” Olvera cautions. For instance, if an application has 10 functions, it might make sense to offer, say, six of them through a GUI or five through a VUI. Additionally, some of that functionality might overlap in each mode. “So there will be certain functions that work on one method [instead of] another one,” Olvera says. Functionality must be very targeted. Overcomplicate the device, according to Phillips, and users get confused. “We ity as an end to itself,” Meisel says. “That’s like technology looking for a solution.” The issue, he explains, is figuring out the best way to deliver satisfaction to a customer to help him achieve his given objective. Yet, it’s more common to see a consumer interacting with a device through a typepad or touchscreen than by voice. Users are still hesitant about speech thanks to years of bad deployments and shoddy technology. Mitby confesses that when he started at Tellme eight years ago, he always felt slightly embarrassed to tell people that he worked in the speech industry. However, he’s also discovered that as interfaces have improved over the years, consumer confidence has grown. Indeed, Microsoft announced in May that its software, previously exclusive to Ford Sync, will be loaded into select Kia and Hyundai cars by Novem- the consumer will get smarter about how to use the application, and the technology will get smarter about how to work with the consumer.” Who’s in Control? Designers occasionally also have to consider carrier control. On North American feature phones, vendors need permission from the telcos to access application programming interfaces, like recording audio. That’s why vlingo Find, for instance, is currently enabled on Sprint phones. There’s more openness when designing applications for smartphones. “Now the problem is: Who’s servicing you?” Olvera states. “AT&T? Sprint? Each one has different bandwidths in different regions with different data transfer speeds. What works in one device in one location may not work for another device in the same location.” “AS PEOPLE START USING THESE APPLICATIONS MORE, THE CONSUMER WILL GET SMARTER ABOUT HOW TO USE THE APPLICATION, AND THE TECHNOLOGY WILL GET SMARTER ABOUT HOW TO WORK WITH THE CONSUMER.” have a fairly rich set of functionality, but we don’t push it so hard on first-time users,” he says. “There’s hidden functionality.” Mitby echoes the sentiment. “We could technically do a lot more with speech than we actually do,” he says, “but we’d rather target it at these use cases that really matter and focus on problems where we can put a lot of resources instead of being a be-all-and-end-all with speech. We want it to really work for a smaller set of tasks.” For Ghanekar, that meant identifying key pain points that still exist on mobile devices, namely, text input. He found that users don’t want to type long, complex queries, and voice was the natural fix for that problem. Ultimately, voice is a tool in the same way a mouse attached to a computer is a tool. “To be honest with you, I’m a little concerned with pushing multimodalber. It’s rumored that the Kia and Hyundai systems will come with additional features, with some bloggers speculating about voice controls for navigation devices and security features. “[Speech technology] needs adoption, and people need to see it works for tasks that matter,” Mitby says. Ultimately, he believes that if there are minor problems, but an application provides consistent value, the user will be forgiving. “We’re in a sweet spot where the technology is good enough to produce very good applications that are usable by the average person,” says Todd Emerson, director of solutions engineering for Medio Systems, which partnered with U.K.-based speech recognition provider Novauris to design Verizon’s Get It Now service, allowing enabled phones to download certain applications. “As people start using these applications more, Emerson says that some of the primary constraints Medio encountered when designing Get It Now were the restrictions around maintaining Verizon’s brand preferences. Pixels around selected boxes, for instance, had to be a certain size and not interfere with the company’s branding colors. Because of Medio’s close working relationship with Verizon, Emerson was cautious when asked how his company and Novauris worked to create the best interface while still conforming to the carrier’s standards. “Sometimes you don’t,” he says. “I guess the answer is you take the constraints that you have and you do your best to work within them and create a product that would be the most useful. Then you go back to the carrier customer, show them reports, do [audiovisual] testing to show them alternatives that might work better.” 18 | Speech Technology JUNE 2008 www.speechtechmag.com http://www.speechtechmag.com
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.