Mozilla's VP of Technology Strategy, Sean White, writes:
I'm excited to announce the initial release of Mozilla's open source speech recognition model that has an accuracy approaching what humans can perceive when listening to the same recordings... There are only a few commercial quality speech recognition services available, dominated by a small number of large companies. This reduces user choice and available features for startups, researchers or even larger companies that want to speech-enable their products and services. This is why we started DeepSpeech as an open source project.
Together with a community of likeminded developers, companies and researchers, we have applied sophisticated machine learning techniques and a variety of innovations to build a speech-to-text engine that has a word error rate of just 6.5% on LibriSpeech's test-clean dataset. vIn our initial release today, we have included pre-built packages for Python, NodeJS and a command-line binary that developers can use right away to experiment with speech recognition.
The announcement also touts the release of nearly 400,000 recordings
-- downloadable by anyone -- as the first offering from Project Common Voice
, "the world's second largest publicly available voice dataset." It launched in July "to make it easy for people to donate their voices to a publicly available database, and in doing so build a voice dataset that everyone can use to train new voice-enabled applications." And while they've started with English-language recordings, "we are working hard to ensure that Common Voice will support voice donations in multiple languages beginning in the first half of 2018."
"We at Mozilla believe technology should be open and accessible to all, and that includes voice... As the web expands beyond the 2D page, into the myriad ways where we connect to the Internet through new means like VR, AR, Speech, and languages, we'll continue our mission to ensure the Internet is a global public resource, open and accessible to all."