Sunday, February 28, 2010

Speech To Speech

When Google talks, people listen. Recently, Google has been saying that sometime in the near future, when you talk, people can listen in other languages, thanks to your phone and their software. Franz Och, the head of Google's translation services has said "We think speech-to-speech translation should be possible and work reasonably well in a few years' time." I'm not the only one who's skeptical.

In 2007, I evaluated a state-of-the-art StS MT system that was built for military applications (think of the Somali pirate crisis from 2009 or any scene from The Hurt Locker). I don't think it delivered. The issue is quality. StS MT can "work" if:

a) Both parties are extremely cooperative and are willing to put some work into the exchange. (Eg, training the system on their voices; repeating things, maybe more than once, if the result is unclear the first time; tolerating the inherent delays.) This could mean a life-or-death situation where the alternative -- no translation -- is worse than a bad translation. (Although the stakes could be higher, even deadly, if the translation is faulty.)

b) The comprehension of the content is held to a low standard. Eg, if two businesspeople or recreational chatters want to feel like they're getting to know each other; not trying to hammer out the fine details of a legal agreement.

c) There is no alternative. Because having the parties type content into an online text-based MT system immediately removes one source of error.

As long as the sources of error are as great as they are now, I have trouble thinking of many contexts where people would be willing to tolerate the flaws. Maybe chatters who are only looking for entertainment and have no bottom line regarding accuracy. In war / emergency contexts, perhaps. In business, I think the problems just about doom the effort, unless a cultural adjustment makes people value "meeting" someone in this way even when the comprehension is shaky.

I was, however, impressed by how well the speech recognition worked with my voice when I trained the system to recognize me. Perhaps if Google trains the system on enough people, just about anyone new to come along would sound close enough to one of those. The speech-to-text part of the problem just might be solvable for a large segment of people.

But that leaves the machine translation link in the chain, and that's something where there's little reason to suspect that a quantum leap is about to happen. I took Mr. Och's own quotation above and translated it to Spanish and then back to English using Google's text-to-text machine translation. It did a pretty good job, but it changed his use of the word "should" to "must". That's the sort of thing you don't want to have happen when a Somali pirate has his gun trained on you.

No comments: