Streaming Media - December 2007/January 2008 - (Page 48) audio, or other types of streaming content, really nobody is doing search in the sense of, ‘Let’s get in and really understand this content, analyze it, and make it searchable.’ Some companies are doing simple things like speech detection analysis or looking at waveforms in audio and doing similarity analysis, but it’s still basic and doesn’t work consistently well.” However, there are companies beginning to move forward with some interesting approaches. Chandratillake says blinkx is using speech recognition technology in conjunction with visual analysis of videos to provide a way to expose what’s inside the video or audio. “Our obsession is how much can you get the computer to actually understand the video itself or, in our case, the audio as well. To that end, we specialize in technology that not only finds the video that’s out there on the web and reads text Figure 1. Using blinkx, you can track videos by keyword. This search found every video that mentions around it, but watches the video and also Boston Red Sox pitcher Curt Schilling. listens,” he says. Gary Price, who runs the website ResourceShelf and who is also the director of online information sources at Ask, sees TVEyes and Nexidia as two companies pushing multimedia search forward. “Right now for some of the technology I’ve seen, whether it be what TVEyes or Nexidia is doing, is really what I call the state of the art—that is, being able to work with [multiple] languages and breaking down the text into phonetic sounds, so it’s much more accurate and cost-effective in terms of computing cycles and it’s much faster.” He adds that using a phonetic approach is also more effective than trying to do pure speech recognition, which is trying to resolve a word against a word in a dictionary and is much slower. Drew Langham, SVP of media at Nexidia, views this as a great advantage. “We take any recorded audio or video source and break it into a purely phonetic index. So what happens is we create this index, depending on the processor Figure 2. You can build a dashboard view with Nexidia, as in this look at a corporate call center, which enables managers to slice and dice information to monitor calls more efficiently and see what types of calls you are using, at about [340] times real time. are most popular. This means one hour of media gets processed in about 12 seconds and rendered searchable. When someone wants to search for a piece of information, they type a text query. It gets parsed to the “We look at the audio track as the primary means of phonetic equivalent and matched to the exact point in the determining the content and context of what rich media audio or video where it was said.” is about. In a consumer-generated video like on TVEyes monitors broadcasts for its subscriber YouTube, with lots of background music or a visual stunt, clients using a hybrid of the phonetic and dictionary our approach would not be effective, but the vast majority approach. David Ives, the company’s CEO, says of content has spoken content and we offer an economical TVEyes’ approach depends on the language being way of determining the content,” he says. spoken. He admits there are limits to speech recognition Clients report that TVEyes is an effective way to technology, but he says it gets you further than looking at monitor mentions in broadcasts, a task that would be tags or other text information. impossible without this technology. Ellen Davis, senior 48 STREAMING MEDIA December 2007/January 2008 the search is on
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.