Comment Re:Speech Recognition software? (Score 1) 99
I'd like to know the answer to this question, too. (BTW, this is my first Slashdot post -- sorry if I break some convention.
I've been looking for OSS speech recognition, text processing, and speech synthesis tools for quite some time now and haven't found anything that is royalty-free and subscription-free.
If I can't find it, maybe I'll try building it myself. I have been toying with the idea of starting a project to create a natural language toolkit. I envision a set of modules like the following:
- Audio sampling
- Speech to phoneme conversion (probably outputs IPA codes with extra duration and pitch information).
- Multi-lingual dictionary (with spelling, pronunciation, language-neutral definition, etc.) project that is expandable to an arbitrary number of word senses and languages and dialects.
- Software that matches phonemes to word senses in the dictionary. (Each sentence is mapped to a sequence of actions and actors in a multi-dimensional co-ordinate space.)
- Software that matches written text to word senses.
- Software that is able to ouput text or speech based on the internal language-neutral format.
Putting together these kinds of tools, things like bidirectional speech interfaces, real-time translators (on PDAs?) and a lot of other things are theoretically possible. (I would like to have my cell phone automatically translate any calls I get that aren't in German or English to English, automatically, for instance.) As an OSS project, it could make speech-enabled applications really cheap and ubiquitous.
I've been reading every book I have been able to find lately to understand the problems and solutions that have been tried. I think this is a good fit for an OSS project because the hardest parts (the dictionary and the grammar rule system) are so amenable to being done in parallel. I have done a few calculations and feel that the amount of work to do a project like this is in the hundreds of man-years to get hundreds of thousands of word senses and thousands of grammatical rules. This may have been what has put people off up to now.
However, if the work were shared by 10-100 people, or even more, it could actually happen pretty quick. And any way, good interfaces may only need a few hundred words in certain specialized applications.
Does anyone have any idea how many people might be expected to contribute to a project like this? If it is on the order of hundreds, worldwide, then this could work. In fact, if it were thousands of contributors, it could gain enough critical mass to outpace other similar pure-commercial developments.
Comments?