StonyCreekBare writes "A client wants to build a kiosk system intended to interact with the user entirely via speech. Speech Recognition is absolutely key to the success of the project, so an excellent speech recognition engine is absolutely key to success.
Key requirements are Speaker Independence, and a large vocabulary, with a great deal of flexibility for recognizing arbitrary speech. The system needs to interact with arbitrary speakers on a walk-up basis.
I have built a reasonable "Proof-of-concept" prototype using an L&H / Windows based system. I was quite pleased with the overall performance of the system, and believe an optimized system could do even better. My goal is not so much to improve the recognition performance (although there is room for improvement), as to improve the system reliability and to have more control at the system level.
There seems to be two candidates to supply the system. Microsoft and Nuance.
The Microsoft Speech SDK has the unfortunate circumstance of being innately wedded to Windows, and all the other viable systems (such as L&H, and Viavoice) seem to have been acquired by Nuance. Microsoft's system seems to require a lot of training to perform well, which is unacceptable. At least the L&H system is truly speaker independent. I would greatly prefer to use a Linux or BSD solution, if viable, so that requires a *nix compatible solution.
I have seen some other systems, mostly proprietary systems for telephony applications. e.g. Sprint, to name one. I hear about other systems such as Sphinx from Carnegie Mellon, and a system from Phillips, both of which I do not know much about and do not know anyone actually using.
What are Slashdot users experiences with the various systems available? Have I overlooked any good candidates? What is the "bleeding edge" in reliable speech recognition? Am I going to be forced to use Windows?