rhysjj - Slashdot User

Comment Re:Here's what you'll look like (Score 2, Informative) 220

by rhysjj on Thursday October 26, 2006 @09:06AM (#16592102) Attached to: 'Tower of Babel' Translator Under Development

It's true that articulatory speech recognition should be easier than automatic speech recognition (ASR) based on waveform analysis alone. It's massively unfortunate that ASR research has, at least for the past 20 years, concentrated mostly on the latter and not the former. Janet Baker, whose MIT PhD introduced Hidden Markov Model (HMM)-based ASR, and opened the door to companies such as Dragon (which she and her husband founded), is herself now saying that HMMs are rubbish for speech recognition. I desperately hope that through this CMU project, and others, that people will start to take note of this.

I think you're entirely correct that the machine translation (MT) stage is a bolt-on in this particular project. This project is I think a vehicle for articulatory ASR rather than MT. But I wouldn't be so keen to dismiss MT efforts altogether. It's true that in some ways the current deployable systems make gross assumptions about language, which may be even worse than the assumptions ASR systems make about speech (that's particularly true of purely statistical MT systems). But Google and others have apparently shown that with a large enough corpus, you can get results that extend beyond simple phrase-book look-up quality.

There's one main question facing the researchers at CMU, I think. That's whether people will be happy to stick a dozen electrodes on their face in order to achieve speech-to-speech translation, or whether they'll prefer to speak into a microphone and have a speech synthesiser (e.g. the open-source Festival, partly developed at CMU) speak the result. I'm not entirely convinced they will, but I'd be absolutely delighted to be proved wrong.

Slashdot Top Deals