There's no perfect solution, but something that works for 60% might already be better than nothing.
I work in the closed captioning industry, and I'd say anything less than 95% accuracy is actually WORSE than nothing. Automatic Speech Recognition (ASR) has no concept of context or situational awareness. The mistakes they make tend to be not in the simple common words and phrases, but concentrated in the nouns, especially proper nouns: names of people, places, companies, products, etc. Even at 80% accuracy, which is quite good for the current best speaker independent ASR systems, you're looking at 2 words out of every 10 being substituted with the wrong word, completely changing the meaning of the phrases. Imagine the chaos if (major news network)'s closed captioning reported some celebrity or politician as saying "I'm not a fan of Jews." when they actually said "I'm not a fan of juice." (Which would be 83% accurate!) Wars have been started for one misheard word out of a thousand; imagine how bad 200 out of 1000 would be.
Here's an article about a HUMAN transcription error that caused a pretty major ruckus. Now imagine this kind of problem being an order of magnitude worse:
People who lost hearing later in life tend to do better with high error rate ASR because they know what words sound like and can figure out easy substitutions, e.g. Juice vs. Jews, Election vs. Erection, etc., but people who were born deaf or lost hearing before language acquisition cannot easily make these substitutions in their head because they don't "hear" the word sounds when they read them.