Comment Re:Too much too soon, or tackling wrong problem? (Score 1) 123
The conversational telephone speech (CTS) results I quoted above were achieved using a state-of-the-art research system running under 10 times real time (10xRT); i.e., using less than 10 hours to transcribe an hour of speech. The winning system in 2004 DARPA EARS evaluation achieved 15.2% WER. For system description, see this paper (requires subscription to ieeexplore). In 2004, many EARS teams achieved the same level of performance in real time as their 10xRT system in 2003. Since EARS program was killed after 2004 evaluation and DARPA's focus has shifted to foreign languages (GALE), it is hard to predict the current state-of-the-art in English CTS transcription and when that level of performance will be available in commercial products.
Just to correct my earlier post, Arabic broadcast news (BN) transcription error rates are still around 20%. Mandarin Chinese BN character error rate is close to 10%.