EveryZing's Stephen Baker Talks Voice Recognition Technology
WebmasterWorld's Brett Tabke sits down with EveryZing's Stephen Baker to discuss the technology offered by the newly renamed EveryZing, how SEO's can benefit from the service, and more.
Vanessa Zamora
Video Content Producer, SearchEngineWorld
10:30 pm on Feb. 17, 2009 (utc 0)
Transcript
Brett Tabke: This is Brett Tabke. Thanks for joining us. We are back here again today with uh, joining us is Stephen Baker Baker. How big is EveryZing?
Stephen Baker: Uh, we are twenty people. We just finished our B round in June of this year. Most people probably know of us as PodZinger. We re-branded to EveryZing shortly after the B round.
Brett Tabke: Z-I-N-G
Stephen Baker: Z I N G. Right, exactly, and it's fun because we like to say you know that Tom is now CEO of everything. He has a big ego. But no, it's been uh, just twenty people mostly focus on technology, really talented development team and about four or five of us on the business side that are really getting to go to market effort, and the B round brought us up to about 10 million dollars in funding. So we are hopefully good for the next couple of years, really put this in to take off.
Brett Tabke: So what kind of tech is behind EveryZing? I've been looking at it since the days if PodZinger and it's really fascinating what you guys are up to, the leading edge voice recognition?
Stephen Baker: Yeah, it's speech to text. So with voice recognition you've got the two paths. You've got phonetic based recognition and speech to text. We certainly feel that for it to make multimedia a cohesive component of today's web infrastructure, text is really the way to go, so speech to text is at the core of what we do. We were spun out of a company called BBN technologies. BBN has been around since 1948. Really a government R&D shop. They do have a variety of strains of R&D going on, one of which happened to be speech to text, and in 2001 they received a grant from the US government, specifically from Homeland Security/Department of Defense to create the next generation speech to text platform. They received a $50 million dollar grant to do that, they delivered it, but BBN is not in the business of commercializing their products or serving customers outside of the public sectors, so they decided to take that library of IP, drop it into this new company called PodZinger and now EveryZing, and that is what we became the beneficiary of, so we got essentialy a very powerful speech recognition engine that focuses on speech to text transcription and it's backed by about 50 million dollars of R&D by some other <inaudible> in the space.
Brett Tabke: so you guys crawl the web looking for podcasts, audio content, and
Stephen Baker: That was kind of the first business model. What we are doing now is we're really building a search and publishing platform on top of this recognition engine, and with the goal of providing it to the media space, to the contnct producers or to the publishers, and then what they can do is upload their content, or actually we make it very easy, we just crawl the content, we stream it through our servers, create a text transcript, time code it, and then funnel it back to them so it synchs up with the original media. And once we get that text transcript we can do, really three things with it, 1) we can publish it to SEO optimized pages, so now you are visible by your regular web crawlers, 2) we can host it in a site search index or in an enterprise search index so now it is more easy to discover and consume content on the media company site itself, and then 3) because we have got this rich body of content we can actually use it for ad targeting, so they can use the content of the transcript to figure out what's the right way to call ads, whether it be display ads or whether you boil it down to a bag of words that you send over to you know Google or Yahoo for your search marketing results, or whatever it might be. So that's really kind of the ecosystem that we are building, it's really an end to end solution focused to the media industry, to content providers specifically.
Brett Tabke: A lot of webmasters doing podcasts these days, what should they know about EveryZing, how can it help them?
Stephen Baker: Yeah, I mean there are a couple of things. We're certainly not positioned as being SEO consultants, search engine marketing consultants, to any degree. We really view ourselves as providing technology or at least the output of our technology helps the SEO community do what they need to, forth and on behalf of their customers. So there are a couple of things. You can get familiar with the technology just by coming to EveryZing.com, uploading your content, and you will actually see how we process the text. You will see it put out on the EveryZing domain and you will see consumable in our player. The other thing you will see is because we have optimized EveryZing, it should start appearing in Google and Yahoo results not too soon after uploading it, and you'll be seen for things like Mick Romney video or Michael Vic, that we are actually placing top ten on the Google and Yahoo SERPs. So they can get familiar with it that way and then of course contact us if they are working with media companies or content producers and we can figure out a way to establish a commercial relationship.
Brett Tabke: And you are not just doing audio, you are actually crawling the audio portion of videos too.
Stephen Baker: That's right. Yeah what we want to do with any form of multimedia is seperate out the audio track, so whether if it's a video file, pull out the audio track, or if its pure audio, just pull out the spoken word
Brett Tabke: Well now, excuse me. Some of the stuff we've done has just popped up on EveryZing, we've not submitted it or anything. Are you guys crawling Google or...
Stephen Baker: You know what we are doing. We are plugged into the YouTube API, and we pull down the tens of thousands of video uploads that we see through the API on a daily basis. Now we do that for a couple of reasons, one it gives us just a better test bed to train our algorithms against, the more video we throw at it the more intelligent our algorithms become, and so that is probably how you are beginning to see that stuff. The disadvantage, the only difference when you see YouTube content on our site is we've snapped in the YouTube player, because we had to, as opposed to our player, but we still do expose the transcript, so you are definitely getting an SEO uptick in that, and absolutely if you've got content
Brett Tabke: It's awesome backlinks
Stephen Baker: It's good backlinks. You can upload it to us or put it on YouTube and we are intercepting it that way.
Brett Tabke: Any plans in the near future to get into pure video?
Stephen Baker: Potentially, but not yet. I really see you know our business model is, we want to work with the Viacom's and the NBC's of the world and help them. I guess if I take a step back, to me there is two types of video, at least from an SEO perspective. There is video that you want to virally release, to get the message out. You don't really care about getting the traffic back to your site. It's more about the message. And then there is sort of the content, sort of the areas where YouTube and Google have had problems, you know kind of your Comedy Central content, where the goal is to actually get people back to Comedy Central, not to consume it on YouTube itself, and we are really optimized for that latter case where we can actually help the content producers themselves unlock the content of their video/multimedia and you know SEO it and get it out there. So that is really the environment that we work best in and sort of from a business model how we are aligned, but that said, for marketers and for search engine marketers who have got you know content, video or audio content, that they are uploading to YouTube, we are certainly picking that up and getting you an SEO uplift as well.
Brett Tabke: Cause it just seems like a natural transition to go from voice recognition to character recognition. You know some of the video engines are doing OCR on every frame of video.
Stephen Baker: It's amazing right. There are a couple of things. When I was at FAST, we experimented with OCR in two aspects, one OCR just to pull off the closed captioning feed and begin to do some text manipulation analysis of it, second we were starting to do frame analysis. Can you find the baseball being hit? Can you find the Nike swoosh? Those kinds of things, and there is certainly good technology out there, but what we found is that from a CPU standpoint, to process the 30 frames a second you are seeing in most professionaly created video, you just need massive server arrays. So the technology exists and it's good, but I think it's commercial deployment is probably sometime off because there is some optimization that needs to take place there. So for the time being, we are focused 100 percent on speech to text. It's certainly not a stretch, you know if we are successful as a company, and our business model takes off, to invest in other forms of R&D, but for right now we are going to stick to the audio because that is the most successful for us.
Brett Tabke: But isn't audio just as intensive almost?
Stephen Baker: Not the way we've designed the algorithms. We can process audio at about a .75 to 1 ratio, so 1 minute of content we can process in 45 seconds. We've done that through significant optimization of our algorithms and how things are set up on our server array, and that enables us to do some really cool things. For example, CBS radio is giving us live feeds that we are transcribing in real time, with about a thirty second delay, spitting back to their websites and they have got, they are using that for service related content, scroll and targeted ads, those sort of things. So we really have an advantage right now in that processing speed as it relates to audio.
Brett Tabke: Speaking of time, we are about out of it. I appreciate you taking the time to be with us.