Open Source Speech Recognition 140
bedahr writes "The first version of the open source speech recognition suite simon was released.
It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model.
These components are united under an easy-to-use graphical user interface.
Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts.
It also provides means to train the language model with new samples and add new words."
been playing with it (Score:5, Interesting)
--
webmasters: personalized bookmarking [primadd.net] [primadd.net] scripts for your site
wp and phpbb plugin available
Re:been playing with it (Score:5, Insightful)
Re: (Score:2, Funny)
Re:been playing with it (Score:5, Funny)
Re: (Score:3, Funny)
Re:been playing with it (Score:5, Informative)
Re: (Score:2)
You might want to do what they do in Star Treck and put a word infront of every command. Something like "Computer: Lights off" will reduce the chance that some random sentences from the TV will trigger the command. Unless you're watching Star Treck ofcourse.
My Mac does that as well, though the one time I tried it, I did not have much success. Maybe because I had a cold, maybe because I'm not a native speaker, so my fancy Mac hates my Slavic accent.
But it's quite a nice thing anyway.
Re: (Score:2)
I have been exploring the voice computer control,
and having a name called before a direct command,
helps the command being directed properly.
I was using computer at first,
then next I evolved the process and
I used the name of the computer,
say "lisa, open do that"
combining naming and semantics.
I started using "dragon naturally speaking"
back in the nineties, and language control,
is not a substitute for keyboard and mouse
input control, but an additionalc control,
suitable in context.
with an o
Re: (Score:2)
I used greater than and lesser than symbols,
but they and content in between was removed.
to demostrate. say is
the command desirable would be:
"lisa, <.task1.> <.device.> to <.task2.> <.identity.>";
where
task = tell,
device = human||animal||computer,
task2 = do_something,
identity = me_registered||somebody_registered||computer_registered
task management completed.
Love to see it come true.
C.
Re: (Score:2)
working on this for a long time and deserves
recognition:
http://cmusphinx.sourceforge.net/html/cmusphinx.php [sourceforge.net]
open source speech recognition project, university grade.
Re: (Score:2)
computerized lip reading:
http://science.slashdot.org/article.pl?sid=08/01/20/0141203 [slashdot.org]
have a camera add to the technology base,
improving accurate recognition of speech.
lips can't lie, unless they tell lies.
and you get what you give in return.
C.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Prikaz: Start Firefox.
Prikaz: Open new tab.
Zamyechaniye: It would be quicker to use a mouse.
Re: (Score:1)
Re: (Score:1)
--
webmasters: personalized bookmarking [primadd.net] scripts for your site
wp and phpbb plugin available
Re: (Score:2, Interesting)
-- bedahr
Re:been playing with it (Score:5, Funny)
Re: (Score:2)
yes, i'm a nerd. =P
I could see how it could be useful in some apps (Score:2)
The software has to be intelligent to know what to do when you press a button and say "shopping list, plums" etc.
I dont think speech recognition is good
Re: (Score:1)
It's funny how many Slashdot readers immediately think of science fair type applications when something interesting like an open source voice reco app is released. Not that such applications don't have merit, but the when compared with high dollar speech application companies like Nuance (http://www.nuance.com/) and Holly (http://www.holly-connects.com/), the real value of open source speech recognition becomes apparent - it's another critical piece of software needed to create a truly open source voice pl
Mask out known audio? (Score:2)
Re: (Score:2)
Re: (Score:2)
It's a speaker-independent, continuous speech recognizer that can be configured to do everything from simple commands to full-text dictation. It's not Dragon's stuff, but it's pretty good.
They even have a pure Java version of it: http://cmusphinx.sourceforge.net/sphinx4/ [sourceforge.net]
Are they productive? (Score:4, Insightful)
Re: (Score:2, Insightful)
Re: (Score:2, Funny)
Re: (Score:1)
Our training persons have spastic disabilities.
-- bedahr
Re:Are they productive? (Score:4, Insightful)
Re: (Score:1)
Re: (Score:1)
Re:Are they productive? (Score:5, Funny)
For those not familiar with this meme (Score:4, Informative)
War stories.... (Score:2)
Re:Are they productive? (Score:5, Insightful)
So depends who you are on how much it improves you productivity.
Re: (Score:2)
Re: (Score:2)
The biggest problem with text to speech is simply having to train the engine, I found Dragon Naturally speaking 9 not too bad, it's training it to recognize your own unique vocalizations that is the problem. I think text-to-speech and voice recognition is a project that demands wiki-pedia like sourcing of voices in different noisy environments nad using millions of samples of peoples voices to improve the alogorithms, I'm surprised no one at
Re: (Score:3, Informative)
You might want to have a look at the voxforge project [voxforge.org]
And this doesn't require changes in the algorithm - just in the model.
-- bedahr
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Also if they needed real conversation I'm certain they could do a lot better then what they're doing (i.e. partner with possibly other call centers
I use only computer dictation for medical notes (Score:3, Informative)
Re: (Score:1)
"Since Julius itself is a language-independent decoding program, you can make a recognizer of a language if given an appropriate language model and acoustic model for the target language. The recognition accuracy largely depends on the models. "
"We currently have a sample English acoustic model trained from the WSJ database. According to the license of the database, this mo
Re: (Score:1, Insightful)
This is not about dictation software (Score:5, Interesting)
Re: (Score:2)
Double the Killer (Score:2)
Open Source, or Microsoft-Owned? (Score:5, Interesting)
[...]
you are not allowed to redistribute (parts of) HTK3
Re:Open Source, or Microsoft-Owned? (Score:5, Informative)
Simon does NOT contain the HTK toolkit - it meerly executes commands.
HTK is free of charge and open source (in the strict sense of you-can-look-at-the-code). It is, however, not "free".
We are aware of that and have not packaged any parts of HTK for the release - you have to download it yourself if you want to modify the model from within simon.
It is not optimal, but we don't have the knowledge and / or manpower to code up something similar in a reasonable timeframe. And after all, it isn't that big of a deal, is it?
-- bedahr
Re: (Score:2)
Re: (Score:2)
Considering it's McBarfles we're talking about here, I'm not surprised. The important point, however, is that it's even more deceptive than you think. If they were, in fact, using lard, calling it "beef tallow" would be false advertising because lard comes from pigs, not cattle.
Re: (Score:2)
Mind you, a lot of vegetarians will tolerate some of the nastier animal products put in cheese, so there are limits to what many people worry about. But it was a bit of a surprise to find out they did this.
I do see the point of the person who pointed out that "lard" is por
WHOA! "Open Source"="can look at code"!!?? WTF? (Score:2)
Hold on just a frakking minute.
What the hell is "open source (in the strict sense of you-can-look-at-the-code)"? Since when did anyone start to mean "open source" as code that was merely available but not modifiable? As this sibling comment [slashdot.org] points out (please mod him up, by the way), the term "Open Source" has a very specific meaning. This meaning was determined at the time this term was
Re: (Score:2, Informative)
This software's license most obviously violates requirements 1, 2, and 3. These are perhaps the most important provisions of the definition and form the basis for the power of calling a license an open source license. By not adhering to this definition when calling licenses and software "Open Source" you dilute the power the terms carries. Simply calling something 'open source' because they allow you to look at the source code is something we should avoid because 'Open Source' requires freedom not just source.
Simon does not violate this description in ANY way.
HTK is not redistributed with simon so simon itself complies exactly with what you are writing.
Simon does not depend on the HTK toolkit. It simply uses it to compile / maintain the model. If you have compiled the model already (simon explicitly asks if you have done that already when starting the first time) you can specify the path to it.
Simon will then just use the model and can still start programs, type text, etc.
There is absolutely no need fo
Re: (Score:2)
Re: (Score:2)
Which languages are supported? (Score:4, Insightful)
Re: (Score:1)
But it has everything to do with the model. You'd just need for exmpaple an Italian language model. (Sure the ui
Simon doesn't even include a language model - it does, however include the means to create one.
-- Peter
Re: (Score:2)
Re:Which languages are supported? (Score:5, Informative)
If you follow the link to the Sourceforge project and look at any of the screenshots (including the one on the front page--at the time when I visited it, anyway), you'll see that they're actually training the software with German. So, it looks like the answer to your question is, yes, it supports more than English.
Open Source? (Score:2, Insightful)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
Simon takes the julius engine and uses the recognition results to do something useful.
Please take a look at the screenshots at the sourceforge page (mentioned in the article).
-- bedahr
Aisle of it (Score:5, Funny)
Re: (Score:3, Funny)
Re: (Score:2)
Wiktionary != Wikipedia (Score:4, Interesting)
I would've expected that kind of sloppiness on the Register, but not on Slashdot (yeah, I know, I must be new here...)
Re: (Score:1, Informative)
Re: (Score:2)
Hey this is slashdot... pedantry is the base of most of the discussions here...
you must be new here uh?
Re:Wiktionary != Wikipedia (Score:4, Funny)
Re: (Score:2)
Geez, if you can't cope with the fact that somebody *slightly* and *accidentally* misrepresented which "project" wiktionary is, you really need to step back and examine your life. Get over yourself.
Pedant's Revolt (Score:4, Informative)
No it's not - Wiktionary is a sister project of Wikipedia. Not a subproject.
However, I must concur that in my experience speech recognition has been extremely patchy. While using it to issue voice commands is OK (and can be a real time-saver as it avoids going into Start, /Applications, Programs menu etc), dictation tends to be pretty rubbish. Especially when you're demonstrating the new speech recognition abilities in Windows Vista and just happen to work for Microsoft. And be in a loud, echoey expo hall. And using a dodgy mike.
Re: (Score:2)
What?
Oh, ah, yeah. Sorry, I've been away from Windows for a _long_ time. This is yet another one of those things that work great when you only have a few items, but really not that well anymore when the lists get longer. Like the task bar...by the time you have more than a few windows open, there isn't space anymore for the text.
Really, there are better ways. Speech
Uses in Telephony (Score:2, Interesting)
work together? (Score:1)
http://news.zdnet.com/2100-9593_22-5383536.html [zdnet.com]
or
http://developers.slashdot.org/article.pl?sid=04/09/13/1058241 [slashdot.org]
Re: (Score:1)
Project's webpage in English? (Score:2)
Re:Project's webpage in English? (Score:5, Informative)
We are sorry that there is no international homepage for this yet.
BUT: you are strongly encouraged to contact me with any questions: grasch < at > simon-listens.org
-- Peter
Re: (Score:2)
Whither Microsoft? (Score:3, Insightful)
Actually, the reason we're not there yet is because most people don't want it. Keyboards and mice are simply a better way to give instructions to your computer than speech recognition is. Could you imagine the clatter of a dozen or more people in close proximity chattering to their computers?
Re: (Score:2)
"HTK was originally developed at the Machine Intelligence Laboratory (formerly known as the Speech Vision and Robotics Group) of the Cambridge University Engineering Department (CUED) where it has been used to build CUED's large vocabulary speech recognition systems (see CUED HTK LVR). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research
Re: (Score:1)
Now, considering that much of said user majority out there is barely able to reach this iq threshold simply interacting amongst themselves, we should shortly be seeing this kind of pro
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I think you're wrong in this point. The reason we are not there yet is not because of demand. It's because the technology isn't quite good enough yet. It's getting very close, relative to five years ago.
Your argument is comparable to someone in the early 80's saying "The reason computers don't come with mice is because most people don't want it." While it's true that most people didn't want mice for DOS machines, the reason
Re: (Score:2)
Re: (Score:2)
You were not the buyer of the system. The demand is there (by the buyer). Unfortunately, the technology isn't there for that to be a very good user experience YET. So the USERs don't *all* like them.
(I happen to like these voice recognition systems, and nearly always use voice, as I find working through a button-driven menu very cumbersome on a cell phone. I'm sure neither of us is alone in our preference.)
Reason we're not there yet... (Score:2)
Re: (Score:2)
In my opinion exposing the recognition results over e.g. dbus would be a better way than to quadruple the efforts by splitting this (HUGE) task to gnome, kde, xfce, window, etc.
Maybe. IMHO the UI needs to be involved though to make sure the speech goes to the right place. Some good forethought could create a really cool environment. Could the WM or something pick it up off dbus and route it to the appropriate apps? Remember, we want "the computer" to respond to voice, not a particular app. "the computer" would determine context and send the voice input to the appropriate app - the one with focus initially until a smarter router is devised. You need the mediator, so not every app
Re: (Score:2)
shipping forecast (Score:1)
filthy open-source (Score:4, Informative)
julius is open source.
htk is *NOT* open source.
The latter is a micro$oft by-product, as clearly shown by the license [cam.ac.uk] that you have to first agree with and then send your email to them in order to download the tarballs...
myself never done this since 1995.
Only one problem (Score:1)
My own personal acid test (Score:2)
Possibly incorrect grammatically, but it's the only obvious way to combine 3 homonyms into what passes for a sentence. Of course, someone saying that might be vehemently agreeing with you as well, "Right! Right! Right!". Sorting that out could be a mess. I've criticised the lack of progress on the speech recognition front for a decade. It's amazing how bad most speech recognition software is.
Here's a better test... Take a standard page of text (about 200 words). Scan it and run it
Really cool to see OSS speech rec come back (Score:2)
CMU Sphinx, an other free speech recognizer (Score:3, Informative)
http://cmusphinx.sourceforge.net/ [sourceforge.net]
http://en.wikipedia.org/wiki/CMU_Sphinx [wikipedia.org]
Open Source Chinese Speech Recognition? (Score:2)
I am not interested in learning the computer to recognize my terrible pronunciation, but rather to have some program expect to hear standard Chinese which I could practice with.
One extremely useful program I have found whic
Re: (Score:1)