Comment Re:15% of the time (Score 3, Interesting) 155
Are you using a dictation program, where every word matters? Or, are your basing your impression of accuracy on a voice assistant like Siri or Alexa? In the latter case, you could say "Play me some tunes by the Allman Brothers Band", and the system could recognize "Play the Allmans", and you wouldn't know the difference, even though the word error rate was 77%.
I have been doing ASR R&D for about 30 years, and 15%, on average, for the speaker-independent error rate on real world (not laboratory) tasks is close enough to state of the art. There is of course a huge variation, based on speaking style, topic, noise conditions, microphone transfer function, etc. I would estimate the cross-speaker variance at about 10% (so 90% of speakers will experience somewhere between 5% and 25% word error rate). That a particular sub-population is out on the high end of that distribution is not surprising at all, particularly if you understand the weaknesses (and, yes, biases) of the model-building (both acoustic and language) algorithms.
Speaker adaptation to the rescue!