Follow Slashdot blog updates by subscribing to our blog RSS feed

Open Source Speech Recognition 140

Posted by CmdrTaco on Saturday January 19, 2008 @11:14AM from the hello-computer dept.

bedahr writes "The first version of the open source speech recognition suite simon was released. It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model. These components are united under an easy-to-use graphical user interface. Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts. It also provides means to train the language model with new samples and add new words."

This discussion has been archived. No new comments can be posted.

Open Source Speech Recognition

Load All Comments

Search 140 Comments Log In/Create an Account

Comments Filter:

been playing with it (Score:5, Interesting)

by primadd ( 1215814 ) writes: on Saturday January 19, 2008 @11:20AM (#22109222) Homepage

I did use julius for a small project utilizing voice recognition once. While not perfect I was quite impressed by the results of the engine. Quite fun to control the light and TV with shout commands, thought once or twice a movie actually triggered "lights off"

--
webmasters: personalized bookmarking [primadd.net] [primadd.net] scripts for your site
wp and phpbb plugin available

Share
twitter facebook
- Re:been playing with it (Score:5, Insightful)
  
  by Anonymous Coward writes: on Saturday January 19, 2008 @11:23AM (#22109254)
  
  You might want to do what they do in Star Treck and put a word infront of every command. Something like "Computer: Lights off" will reduce the chance that some random sentences from the TV will trigger the command. Unless you're watching Star Treck ofcourse.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Funny)
    
    by Anonymous Coward writes:
    
    Only if you are not watching star trek!
    - Re:been playing with it (Score:5, Funny)
      
      by Woldry ( 928749 ) writes: on Saturday January 19, 2008 @12:06PM (#22109662) Journal
      
      "not watching star trek" --- wait, I don't follow.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Funny)
        
        by DMUTPeregrine ( 612791 ) writes:
        
        "Not watching star trek" N. Watching Babylon 5.
  - Re:been playing with it (Score:5, Informative)
    
    by bedahr ( 1222520 ) writes: on Saturday January 19, 2008 @12:07PM (#22109676) Homepage
    
    This is actually the simon approach does: the magic keyword is "simon". "simon Firefox" for example. -- bedahr
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by cp.tar ( 871488 ) writes:
    
    You might want to do what they do in Star Treck and put a word infront of every command. Something like "Computer: Lights off" will reduce the chance that some random sentences from the TV will trigger the command. Unless you're watching Star Treck ofcourse.
    My Mac does that as well, though the one time I tried it, I did not have much success. Maybe because I had a cold, maybe because I'm not a native speaker, so my fancy Mac hates my Slavic accent.
    But it's quite a nice thing anyway.
  - Re: (Score:2)
    
    by Vspirit ( 200600 ) writes:
    
    exactly my sentiment.
    I have been exploring the voice computer control,
    and having a name called before a direct command,
    helps the command being directed properly.
    
    I was using computer at first,
    then next I evolved the process and
    I used the name of the computer,
    say "lisa, open do that"
    combining naming and semantics.
    
    I started using "dragon naturally speaking"
    back in the nineties, and language control,
    is not a substitute for keyboard and mouse
    input control, but an additionalc control,
    suitable in context.
    
    with an o
    - Re: (Score:2)
      
      by Vspirit ( 200600 ) writes:
      
      ups, slashdot parser removed a few tidbits.
      I used greater than and lesser than symbols,
      but they and content in between was removed.
      
      to demostrate. say is .&gt
      the command desirable would be:
      "lisa, &lt.task1.&gt &lt.device.&gt to &lt.task2.&gt &lt.identity.&gt";
      
      where
      task = tell,
      device = human||animal||computer,
      task2 = do_something,
      identity = me_registered||somebody_registered||computer_registered
      
      task management completed.
      
      Love to see it come true.
      
      C.
    - Re: (Score:2)
      
      by Vspirit ( 200600 ) writes:
      
      just to mention a project that have been
      working on this for a long time and deserves
      recognition:
      http://cmusphinx.sourceforge.net/html/cmusphinx.php [sourceforge.net]
      
      open source speech recognition project, university grade.
    - Re: (Score:2)
      
      by Vspirit ( 200600 ) writes:
      
      this post is simply for the purpose of archiving in context.
      
      computerized lip reading:
      http://science.slashdot.org/article.pl?sid=08/01/20/0141203 [slashdot.org]
      
      have a camera add to the technology base,
      improving accurate recognition of speech.
      
      lips can't lie, unless they tell lies.
      and you get what you give in return.
      
      C.
  - - Re: (Score:1)
      
      by cleatsupkeep ( 1132585 ) writes:
      
      What if we don't know how to pronounce it? :-).
      - Re: (Score:2)
        
        by Marcos Eliziario ( 969923 ) writes:
        
        Provided you mis-pronounce it always the same way, I think it's not much of a problem ;-)
    - Re: (Score:2)
      
      by iapetus ( 24050 ) writes:
      
      Although, of course, "prikazyvat" actually means "to order", so wouldn't make much sense. I prefer the HK-47 approach, using the noun form "prikaz", meaning "order".
      
      Prikaz: Start Firefox.
      Prikaz: Open new tab.
      Zamyechaniye: It would be quicker to use a mouse.
- Re: (Score:1)
  
  by hoppo ( 254995 ) writes:
  
  Interesting. Does it come with a pre-trained model? Are you able to train it more fully?
  - Re: (Score:1)
    
    by primadd ( 1215814 ) writes:
    
    There is one premade for japanese, and one for english, both are not that big, thought english one is lacking even more. You can write you own grammar, see http://julius.sourceforge.jp/en/grammar.html [sourceforge.jp] and for limited set of commands that is the best way to go
    
    --
    webmasters: personalized bookmarking [primadd.net] scripts for your site
    wp and phpbb plugin available
    - Re: (Score:2, Interesting)
      
      by bedahr ( 1222520 ) writes:
      
      Actually you don't need to get your hands dirty for writing your own grammar. Simon includes a complete grammar module with ways to compile the grammar, edit the sentence structures, import them from written texts (by looking the words up in the dictionary), etc.
      
      -- bedahr
- Re:been playing with it (Score:5, Funny)
  
  by Anonymous Coward writes: on Saturday January 19, 2008 @12:02PM (#22109614)
  
  Not perfect? Like, if you say "Open the pod bay doors, HAL," it'll say "I'm sorry, Dave, but I can't seem to do that," and try to kill you (even though your name is Steve)?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by deander2 ( 26173 ) * writes:
    
    nitpick: it's "i'm sorry dave, i'm afraid i can't do that"
    
    yes, i'm a nerd. =P
- I could see how it could be useful in some apps (Score:2)
  
  by backslashdot ( 95548 ) writes:
  
  Say for remotely controlling say the TV or something .. instead of having to remember the channel number you could just say "TV (or other trigger word), Discovery channel". I guess combined with an LCD/OLED button remote it could be used. Also, on a phone ... it should be possible to use speech to text for certain stuff like adding items to a shopping list.
  
  The software has to be intelligent to know what to do when you press a button and say "shopping list, plums" etc.
  
  I dont think speech recognition is good
  - Re: (Score:1)
    
    by estevon07 ( 1068778 ) writes:
    
    It's funny how many Slashdot readers immediately think of science fair type applications when something interesting like an open source voice reco app is released. Not that such applications don't have merit, but the when compared with high dollar speech application companies like Nuance (http://www.nuance.com/) and Holly (http://www.holly-connects.com/), the real value of open source speech recognition becomes apparent - it's another critical piece of software needed to create a truly open source voice pl
- Mask out known audio? (Score:2)
  
  by crow ( 16139 ) writes:
  
  Well, if your TV is being controlled by the same computer (e.g., MythTV), then shouldn't the voice command be able to mask out anything the microphone picks up that matches the output sound? If there software to filter audio input to filter out what is currently being played? I'm sure it's a bit tricky to get right, but it would be very useful for a range of applications including this topic and speakerphones.
- Re: (Score:2)
  
  by owlstead ( 636356 ) writes:
  
  Of course, you should use "illuminate", in which case only one movie will trigger the light switch :)
- Re: (Score:2)
  
  by samkass ( 174571 ) writes:
  
  Carnegie Mellon open-source sphinx years ago: http://cmusphinx.sourceforge.net/html/cmusphinx.php [sourceforge.net]
  
  It's a speaker-independent, continuous speech recognizer that can be configured to do everything from simple commands to full-text dictation. It's not Dragon's stuff, but it's pretty good.
  
  They even have a pure Java version of it: http://cmusphinx.sourceforge.net/sphinx4/ [sourceforge.net]
Are they productive? (Score:4, Insightful)

by bogaboga ( 793279 ) writes: on Saturday January 19, 2008 @11:31AM (#22109328)

In my experience, I have not found speech recognition engines/software that productive. Too many errors and a slow [and steep] "learning" curve for the engine. I will have to be convinced that this simon thing is any different for me to give it a spin.

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  No doubt you're correct, but it's got to be a boost for anybody who cannot type effectively.
  - Re: (Score:2, Funny)
    
    by Anonymous Coward writes:
    
    Hey! Let's leave slashdot out of this.
  - Re: (Score:1)
    
    by bedahr ( 1222520 ) writes:
    
    This is exactly what we are going for.
    
    Our training persons have spastic disabilities.
    
    -- bedahr
- Re:Are they productive? (Score:4, Insightful)
  
  by Yvanhoe ( 564877 ) writes: on Saturday January 19, 2008 @11:50AM (#22109490) Journal
  
  I doubt that speech recognition is ready to be used as an alternative to keyboards to type text, but I think it can become, after the keyboard and the mouse, a third input device that would boost the productivity of a computer user.
  
  Parent Share
  twitter facebook
  - Re: (Score:1)
    
    by b.emile ( 1222958 ) writes:
    
    Exactly. I have played with some speech-to-text apps, and it always seems much slower and less accurate than if I were typing. They will have to be a lot more accurate for them to be useful for everyday use.
- Re: (Score:1)
  
  by deanlandolt ( 1004507 ) writes:
  
  Sure, unconstrained writing, for now. But there are countless applications with controlled vocabularies that stand to benefit today.
- Re:Are they productive? (Score:5, Funny)
  
  by Sanat ( 702 ) writes: on Saturday January 19, 2008 @12:06PM (#22109670)
  
  Dear aunt,let's set so double the killer delete select all
  
  Parent Share
  twitter facebook
  - For those not familiar with this meme (Score:4, Informative)
    
    by CaptainPinko ( 753849 ) writes: on Saturday January 19, 2008 @01:53PM (#22110724)
    
    Basically it comes from a live voice recognition demo [google.ca] from Microsoft for their feature in Vista. Yes, I had to look this up myself.
    
    Parent Share
    twitter facebook
  - War stories.... (Score:2)
    
    by EmbeddedJanitor ( 597831 ) writes:
    
    About 15 years ago I worked for a company doing, amongst other things, VR for telephone use. These systems had localised dictionaries to handle accents. We struggled to get the stuff going properly and the only combination we got to work reliably was a Fijian Indian person talking to a British accest VR system. Go figure!
- Re:Are they productive? (Score:5, Insightful)
  
  by Instine ( 963303 ) writes: on Saturday January 19, 2008 @12:07PM (#22109680)
  
  Nearly five years ago I used to help a guy who had no useful movement in his limbs. He could use a mouth stick to type and control the cursor. However he also used Dragon Dictate. His machine was old 7 years ago, and here's the amazing bit (to me at least) his speech was pretty garbled from his condition. Most humans found it very hard understanding him, yet the dictation software did a pretty good job. He wrote an entire screen play (later comitioned by the BBC) and was a lawyer with his own practice (it may sound like it but I'm not making this up). His success with this tech was probably what got me into assitive tech (now my job).
  
  So depends who you are on how much it improves you productivity.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Instine ( 963303 ) writes:
    
    that should read nearly 7 years ago. And no I'm not using speech rec. Just typing quickly and badly ;P
  - Re: (Score:2)
    
    by blahplusplus ( 757119 ) writes:
    
    "So depends who you are on how much it improves you productivity."
    
    The biggest problem with text to speech is simply having to train the engine, I found Dragon Naturally speaking 9 not too bad, it's training it to recognize your own unique vocalizations that is the problem. I think text-to-speech and voice recognition is a project that demands wiki-pedia like sourcing of voices in different noisy environments nad using millions of samples of peoples voices to improve the alogorithms, I'm surprised no one at
    - Re: (Score:3, Informative)
      
      by bedahr ( 1222520 ) writes:
      
      You might want to have a look at the voxforge project [voxforge.org]
      
      And this doesn't require changes in the algorithm - just in the model.
      
      -- bedahr
    - - Re: (Score:2)
        
        by blahplusplus ( 757119 ) writes:
        
        That's not what I'm talking about, they're going about collecting voices the entirely wrong way. The setup is horrible and most importantly they are not setup for user feedback and ratings. In realtime, they need something like a "hot or not" for people to input their own voices and recite words, etc, and then have a rating system that plays it back for other users (i.e. play a sample of text or a sentence, while converting it to text, and vice versa) then have a way for people to rate it, infinitely fast
        
        Re: (Score:2)
        
        by leenks ( 906881 ) writes:
        
        And people would do this because ... ? Their current scheme is good because people want to use that kind of service, and ultimately pay for it (which I guess allows them to pay someone to transcribe the text). This is much more valuable because the recordings will contain proper, off-the-cuff conversational speech (if a little contrived because of the circumstance). Models trained on call-home data etc invariably fail when given real world tasks, so hopefully this would work well.
        
        Re: (Score:2)
        
        by blahplusplus ( 757119 ) writes:
        
        Have you used Dragon naturally speaking 9? I mean something like that, except who's interface is opened up to the public... i.e. a dictation / word application, etc, you can 'get' conversation from dictation, when I'm speaking into DS9 I'm speaking into it like I'm conversing with someone. I imagine you could get 90% of what you need out of something like that.
        
        Also if they needed real conversation I'm certain they could do a lot better then what they're doing (i.e. partner with possibly other call centers
  - I use only computer dictation for medical notes (Score:3, Informative)
    
    by KWTm ( 808824 ) writes:
    
    At my office, we use a computer dictation system for medical notes. It is amazingly accurate for those who speak with accents within the norm. It works well for me, and I will typically dictate something like this:
    "The patient presents today with three complaints comma as follows colon new paragraph For the past week comma he has had right shoulder pain period new paragraph He has noticed that when he sneezes comma there are streaks of blood in his mucus period new paragraph He has been experiencing diaph
- Re: (Score:1)
  
  by MacarooMac ( 1222684 ) writes:
  
  According to the Julius blurb [sourceforge.jp] on the acoustic models used, there are currently just two languages available: Japanese and English.
  
  "Since Julius itself is a language-independent decoding program, you can make a recognizer of a language if given an appropriate language model and acoustic model for the target language. The recognition accuracy largely depends on the models. "
  
  "We currently have a sample English acoustic model trained from the WSJ database. According to the license of the database, this mo
- Re: (Score:1, Insightful)
  
  by Anonymous Coward writes:
  
  In my experience, it mostly comes down to the quality of the microphone /sound input these days, with modern speech recognition software. Computer mics are atrocious. I got a high-end sound card (just an old but good emu10k1-based SB), a decent studio mike (Studio Projects) and a mic pre-amp, used the line-in input on the card, and get very, very good results. Of course, I paid more for my audio setup alone than many people pay for their PCs these days, especially after the noise-reduced fans, psu and c
- This is not about dictation software (Score:5, Interesting)
  
  by idji ( 984038 ) writes: on Saturday January 19, 2008 @03:05PM (#22111352)
  
  Many people think that "Speech recognition software" = "dictation software" - as is clear from many comments here. That is not simply the case. Dictation is just one application of speech recognition - and a personal application at that - which is the only thing most people come across. Other applications are media transcription (closed captioning), media mining "What did Obama say about the prime mortgage market this week?", telephone call center controlling (Are our staff using naughty words? Is the customer using aggressive language?), telephone call mining ("bomb", "anthrax", ...), indexing vast audio archives of news broadcasts (keyword/topic tagging), aligning audio to human transcription (documentaries, DVD subtitles, witness testimonies, court or parliament proceedings - think of any event that is transcribed like UN conferences), etc. Don't you think CNN, BBC or any national film archive would be interested in searching through there millions of hours of recorded footage? Now you tell me - do you think that the holy grail of speech recognition is "HAL - please close the hatch", "Dear Mom, we are having a lovely time here..." or hearing any TV show in any language you want, or calling anyone in the world and being able to talk to them in your own language? Dictation Software is about the only speech-reco application that can be sold to the masses - all the rest is still fairly much below the horizon...
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by houstonbofh ( 602064 ) writes:
    
    And no one ever thinks about the most common use, command and control. Yet every phone maze has it now, and voice dial is on almost every cell phone.
Double the Killer (Score:2)

by rxmd ( 205533 ) writes:

Cue the obligatory lets set so double the killer delete select all. :)
- Open Source, or Microsoft-Owned? (Score:5, Interesting)
  
  by kripkenstein ( 913150 ) writes: on Saturday January 19, 2008 @01:00PM (#22110218) Homepage
  
  Cue the obligatory lets set so double the killer delete select all. :)
  Speaking of Microsoft, according to HTK's FAQ:
  HTK was originally developed at the Cambridge University Engineering Department (CUED). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research Laboratory Ltd was established. HTK was sold by Entropic until 1999 when Microsoft bought Entropic. Microsoft has now licensed HTK back to CUED and is providing support so that CUED can redistribute HTK and provide development support via the HTK3 web site. [...] Microsoft retains the copyright to the existing HTK code
  [...]
  you are not allowed to redistribute (parts of) HTK3
  In other words, HTK - a critical part of the 'Simon' project - is owned by Microsoft. It is also not under a FOSS license: you can look at the code and use it for your own purposes, but you can't redistribute it. In fact, reading this, I wonder if Simon is not in violation of the license.
  
  Parent Share
  twitter facebook
  - Re:Open Source, or Microsoft-Owned? (Score:5, Informative)
    
    by bedahr ( 1222520 ) writes: on Saturday January 19, 2008 @01:12PM (#22110366) Homepage
    
    Simon is in no way connected to Microsoft.
    
    Simon does NOT contain the HTK toolkit - it meerly executes commands.
    
    HTK is free of charge and open source (in the strict sense of you-can-look-at-the-code). It is, however, not "free".
    
    We are aware of that and have not packaged any parts of HTK for the release - you have to download it yourself if you want to modify the model from within simon.
    
    It is not optimal, but we don't have the knowledge and / or manpower to code up something similar in a reasonable timeframe. And after all, it isn't that big of a deal, is it?
    
    -- bedahr
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Antique Geekmeister ( 740220 ) writes:
      
      Most of us will appreciate the careful handling by people by you, and the requirements to use the available tools. But yes, it is a big deal: Microsoft has a very bad history of "embracing and extending" software, and clearly breaking inter-operability in the process. Take a look at what they did to Kerberos when they incorporated it into Active Directory, and how it broke compatibility, and the resulting lawsuits and required patches by MIT to address the problems created by Microsoft and interacting with
      - Re: (Score:2)
        
        by techno-vampire ( 666512 ) writes:
        
        Like discovering that your McDonald's french fries are cooked with lard and thus not vegetarian, it's a big deal. (They called it "beef tallow..."
        
        Considering it's McBarfles we're talking about here, I'm not surprised. The important point, however, is that it's even more deceptive than you think. If they were, in fact, using lard, calling it "beef tallow" would be false advertising because lard comes from pigs, not cattle.
      - Re: (Score:2)
        
        by Antique Geekmeister ( 740220 ) writes:
        
        Reading the definitions you cite, I stand by "vegetarian". I'm not discussing an animal product that is harvested from animals that continue to live, such as eggs, or milk, I'm mentioning the use of beef fat rendered from slaughtered cows.
        
        Mind you, a lot of vegetarians will tolerate some of the nastier animal products put in cheese, so there are limits to what many people worry about. But it was a bit of a surprise to find out they did this.
        
        I do see the point of the person who pointed out that "lard" is por
    - WHOA! "Open Source"="can look at code"!!?? WTF? (Score:2)
      
      by KWTm ( 808824 ) writes:
      
      HTK is free of charge and open source (in the strict sense of you-can-look-at-the-code). It is, however, not "free".
      Hold on just a frakking minute.
      
      What the hell is "open source (in the strict sense of you-can-look-at-the-code)"? Since when did anyone start to mean "open source" as code that was merely available but not modifiable? As this sibling comment [slashdot.org] points out (please mod him up, by the way), the term "Open Source" has a very specific meaning. This meaning was determined at the time this term was
    - - Re: (Score:2, Informative)
        
        by bedahr ( 1222520 ) writes:
        
        This software's license most obviously violates requirements 1, 2, and 3. These are perhaps the most important provisions of the definition and form the basis for the power of calling a license an open source license. By not adhering to this definition when calling licenses and software "Open Source" you dilute the power the terms carries. Simply calling something 'open source' because they allow you to look at the source code is something we should avoid because 'Open Source' requires freedom not just source.
        Simon does not violate this description in ANY way.
        HTK is not redistributed with simon so simon itself complies exactly with what you are writing.
        Simon does not depend on the HTK toolkit. It simply uses it to compile / maintain the model. If you have compiled the model already (simon explicitly asks if you have done that already when starting the first time) you can specify the path to it.
        Simon will then just use the model and can still start programs, type text, etc.
        There is absolutely no need fo
        
        Re: (Score:2)
        
        by Degrees ( 220395 ) writes:
        
        Not knowing much at all about this whole field of software, I'm going to ask: what other software can build the model simon can use? I think that would be pretty cool, if I could use simon with something that is released under the GPL or BSD style licenses. Thanks!
- Re: (Score:2)
  
  by $RANDOMLUSER ( 804576 ) writes:
  
  Cue the obligatory lets set so double the killer delete select all.
  
  Hey! It's hard to wreck a nice beach!
Which languages are supported? (Score:4, Insightful)

by r_jensen11 ( 598210 ) writes: on Saturday January 19, 2008 @11:47AM (#22109456)

That's great and all, but which languages are supported? I hope it's more than just English

Share
twitter facebook
- Re: (Score:1)
  
  by bedahr ( 1222520 ) writes:
  
  The language has nothing to do with the software.
  
  But it has everything to do with the model. You'd just need for exmpaple an Italian language model. (Sure the ui /should/ probably translated as well but that has nothing to do with the recognition).
  
  Simon doesn't even include a language model - it does, however include the means to create one.
  
  -- Peter
- Re: (Score:2)
  
  by dotancohen ( 1015143 ) writes:
  
  I hope it's more than just Lojban [wikipedia.org] grammar with English words.
- Re:Which languages are supported? (Score:5, Informative)
  
  by R.Mo_Robert ( 737913 ) writes: on Saturday January 19, 2008 @03:05PM (#22111358)
  
  If you follow the link to the Sourceforge project and look at any of the screenshots (including the one on the front page--at the time when I visited it, anyway), you'll see that they're actually training the software with German. So, it looks like the answer to your question is, yes, it supports more than English.
  
  Parent Share
  twitter facebook
Open Source? (Score:2, Insightful)

by kylegordon ( 159137 ) writes:

If this is the first, what was Sphinx [sourceforge.net] then?
- Re: (Score:2)
  
  by kylegordon ( 159137 ) writes:
  
  Ooops, I'll learn to read stories properly one day :-)
- Re: (Score:1)
  
  by tg2k ( 895772 ) writes:
  
  Maybe we need to worry more about text-to-brain than speech-to-text...all it says is that it's the first version of simon. It doesn't say it's the first open source speech-to-text project.
  - Re: (Score:2)
    
    by kylegordon ( 159137 ) writes:
    
    Yep, see the above comment that I made just minutes after the previous one :-)
- Re: (Score:1)
  
  by bedahr ( 1222520 ) writes:
  
  Sphinx is just an engine - isn't it?
  
  Simon takes the julius engine and uses the recognition results to do something useful.
  
  Please take a look at the screenshots at the sourceforge page (mentioned in the article).
  
  -- bedahr
Aisle of it (Score:5, Funny)

by ZeroFactorial ( 1025676 ) writes: on Saturday January 19, 2008 @11:50AM (#22109488)

Eye musing i trite now two poster slashed hot. It saw grate pro gram!

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by dotancohen ( 1015143 ) writes:
  
  You know, I actually read that in Festival [wikipedia.org]'s voice!
  - - Re: (Score:2)
      
      by dotancohen ( 1015143 ) writes:
      
      Unfortunately, I live in a flat above an everlasting festival of university students. I've learned to hate types of music that I had never before imagined exist.
Wiktionary != Wikipedia (Score:4, Interesting)

by Anonymous Coward writes: on Saturday January 19, 2008 @11:51AM (#22109500)

Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.

I would've expected that kind of sloppiness on the Register, but not on Slashdot (yeah, I know, I must be new here...)

Share
twitter facebook
- Re: (Score:1, Informative)
  
  by Anonymous Coward writes:
  
  Although you are correct with Wikipedia and Wiktionary being equal in importantance to Wikimedia. You must acknowledge that Wikipedia is the most well-known and talked about project. Therefore have a little grace with people who accidentally think or say that Wikipedia is the mother organization rather than Wikimedia. No need to be overly pedantic.
  - Re: (Score:2)
    
    by xtracto ( 837672 ) writes:
    
    No need to be overly pedantic.
    
    Hey this is slashdot... pedantry is the base of most of the discussions here...
    
    you must be new here uh?
- Re:Wiktionary != Wikipedia (Score:4, Funny)
  
  by kryten_nl ( 863119 ) writes: on Saturday January 19, 2008 @01:09PM (#22110342)
  
  Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.[Citation needed]
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by Blakey Rat ( 99501 ) writes:
  
  On behalf on virtually everyone, I'd just like to say: Who the hell gives a crap?
  
  Geez, if you can't cope with the fact that somebody *slightly* and *accidentally* misrepresented which "project" wiktionary is, you really need to step back and examine your life. Get over yourself.
Pedant's Revolt (Score:4, Informative)

by jrothwell97 ( 968062 ) writes: <jonathan@nosPam.notroswell.com> on Saturday January 19, 2008 @12:00PM (#22109598) Homepage Journal

Simon can import dictionaries directly from wiktionary (a subproject of wikipedia)

No it's not - Wiktionary is a sister project of Wikipedia. Not a subproject.

However, I must concur that in my experience speech recognition has been extremely patchy. While using it to issue voice commands is OK (and can be a real time-saver as it avoids going into Start, /Applications, Programs menu etc), dictation tends to be pretty rubbish. Especially when you're demonstrating the new speech recognition abilities in Windows Vista and just happen to work for Microsoft. And be in a loud, echoey expo hall. And using a dodgy mike.

Share
twitter facebook
- Re: (Score:2)
  
  by RAMMS+EIN ( 578166 ) writes:
  
  ``While using it to issue voice commands is OK (and can be a real time-saver as it avoids going into Start, /Applications, Programs menu etc)''
  
  What?
  
  Oh, ah, yeah. Sorry, I've been away from Windows for a _long_ time. This is yet another one of those things that work great when you only have a few items, but really not that well anymore when the lists get longer. Like the task bar...by the time you have more than a few windows open, there isn't space anymore for the text.
  
  Really, there are better ways. Speech
Uses in Telephony (Score:2, Interesting)

by Anonymous Coward writes:

This could be very useful in projects like FreeSWITCH which is an Open Source project for building telephony applications. More info at http://www.freeswitch.org/ [freeswitch.org]
work together? (Score:1)

by Jesus_Corpse ( 190811 ) writes:

Wouldn't it be a good idea to work with the (open source) speech recognition of IBM?

http://news.zdnet.com/2100-9593_22-5383536.html [zdnet.com]
or
http://developers.slashdot.org/article.pl?sid=04/09/13/1058241 [slashdot.org]
- Re: (Score:1)
  
  by debatem1 ( 1087307 ) writes:
  
  Unless something's changed pretty drastically, the IBM voice projects were dead a couple of years ago. I went up to the office where ViaVoice was handled and they wouldn't even let me buy a copy of it for linux.
Project's webpage in English? (Score:2)

by Lord Satri ( 609291 ) writes:

Trying to learn more about it, I followed the project's website link on the sourceforge page to simon-listens.org [simon-listens.org], but it's german only, found no english (or other language) info. Anyone has an advice?
- Re:Project's webpage in English? (Score:5, Informative)
  
  by bedahr ( 1222520 ) writes: on Saturday January 19, 2008 @12:57PM (#22110196) Homepage
  
  We are sorry that there is no international homepage for this yet.
  
  BUT: you are strongly encouraged to contact me with any questions: grasch < at > simon-listens.org
  
  -- Peter
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by mlc ( 16290 ) writes:
    
    So here's a question: I don't speak (much) German. How do I make this software do something?
Whither Microsoft? (Score:3, Insightful)

by IGnatius T Foobar ( 4328 ) writes: on Saturday January 19, 2008 @12:44PM (#22110068) Homepage Journal

Offices full of people talking to their computers has been Bill Gates' wet dream for decades now. What will happen if open source gets there first?

Actually, the reason we're not there yet is because most people don't want it. Keyboards and mice are simply a better way to give instructions to your computer than speech recognition is. Could you imagine the clatter of a dozen or more people in close proximity chattering to their computers?

Share
twitter facebook
- Re: (Score:2)
  
  by 3seas ( 184403 ) writes:
  
  Hmmm, from the http://htk.eng.cam.ac.uk/ [cam.ac.uk] site
  
  "HTK was originally developed at the Machine Intelligence Laboratory (formerly known as the Speech Vision and Robotics Group) of the Cambridge University Engineering Department (CUED) where it has been used to build CUED's large vocabulary speech recognition systems (see CUED HTK LVR). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research
- Re: (Score:1)
  
  by MacarooMac ( 1222684 ) writes:
  
  Agreed. Speech Recognition will realistically only become a viable substitute to the traditional keyboard/mouse interface - for the user majority, at least - when the AI (and processing power required to run) it has advanced to the point where you and your box can actually hold a 'semi intelligent' interaction/conversation.
  
  Now, considering that much of said user majority out there is barely able to reach this iq threshold simply interacting amongst themselves, we should shortly be seeing this kind of pro
- Re: (Score:2)
  
  by hyades1 ( 1149581 ) writes:
  
  Wither, Microsoft.
- Re: (Score:2)
  
  by mustafap ( 452510 ) writes:
  
  That's really true for the office environment, but I would love voice recognition at home - while in the kitchen cooking, in the bath ( no - don't comment on that one), crashed on the couch. Or even in the car.
- Re: (Score:2)
  
  by Chapter80 ( 926879 ) writes:
  
  Actually, the reason we're not there yet is because most people don't want it.
  I think you're wrong in this point. The reason we are not there yet is not because of demand. It's because the technology isn't quite good enough yet. It's getting very close, relative to five years ago.
  Your argument is comparable to someone in the early 80's saying "The reason computers don't come with mice is because most people don't want it." While it's true that most people didn't want mice for DOS machines, the reason
  - Re: (Score:2)
    
    by IGnatius T Foobar ( 4328 ) writes:
    
    Think of how many times you call a company these days, and your call is routed (or possibly handled entirely) by voice recognition software. It is DEFINITELY in demand. People want it. It's just not down to the PC yet.
    
    Actually, that's a perfect example. I hate those systems. I'd rather just press a key on the phone.
    - Re: (Score:2)
      
      by Chapter80 ( 926879 ) writes:
      
      You hate those systems, yet they are still in place, confirming my point.
      You were not the buyer of the system. The demand is there (by the buyer). Unfortunately, the technology isn't there for that to be a very good user experience YET. So the USERs don't *all* like them.
      (I happen to like these voice recognition systems, and nearly always use voice, as I find working through a button-driven menu very cumbersome on a cell phone. I'm sure neither of us is alone in our preference.)
- Reason we're not there yet... (Score:2)
  
  by gr8_phk ( 621180 ) writes:
  
  The reason we're not there yet is that standalone speech recognition software is stupid. We need KDE and gnome to have built-in speech recognition with a simple API so any application can just monitor the speech input. It should not come in as keystrokes though - must be separate. The speech engine should be a component so different ones can be used of course. If it was there, any app could use it easy enough.
  - - Re: (Score:2)
      
      by gr8_phk ( 621180 ) writes:
      
      In my opinion exposing the recognition results over e.g. dbus would be a better way than to quadruple the efforts by splitting this (HUGE) task to gnome, kde, xfce, window, etc.
      Maybe. IMHO the UI needs to be involved though to make sure the speech goes to the right place. Some good forethought could create a really cool environment. Could the WM or something pick it up off dbus and route it to the appropriate apps? Remember, we want "the computer" to respond to voice, not a particular app. "the computer" would determine context and send the voice input to the appropriate app - the one with focus initially until a smarter router is devised. You need the mediator, so not every app
- Re: (Score:2)
  
  by orin ( 113079 ) writes:
  
  How can Open Source get their first? Speech recognition is built into Vista! The "Speech Recognition" Icon in the Control Panel kinda gives it away.
shipping forecast (Score:1)

by miruku ( 642921 ) writes:

I was thinking last night about what could be used to auto translate the Met/BBC Shipping Forcast [wikipedia.org] into lay speak (just cause). This project sounds promising.
filthy open-source (Score:4, Informative)

by jumbolo ( 75644 ) writes: on Saturday January 19, 2008 @02:06PM (#22110810)

simon is open source.
julius is open source.
htk is *NOT* open source.

The latter is a micro$oft by-product, as clearly shown by the license [cam.ac.uk] that you have to first agree with and then send your email to them in order to download the tarballs...

myself never done this since 1995.

Share
twitter facebook
Only one problem (Score:1)

by ThatsNotPudding ( 1045640 ) writes:

You have to speak in the voice of Comic Book Guy.
My own personal acid test (Score:2)

by PingXao ( 153057 ) writes:

Write 'rite' right.

Possibly incorrect grammatically, but it's the only obvious way to combine 3 homonyms into what passes for a sentence. Of course, someone saying that might be vehemently agreeing with you as well, "Right! Right! Right!". Sorting that out could be a mess. I've criticised the lack of progress on the speech recognition front for a decade. It's amazing how bad most speech recognition software is.

Here's a better test... Take a standard page of text (about 200 words). Scan it and run it
Really cool to see OSS speech rec come back (Score:2)

by seanthenerd ( 678349 ) writes:

I'm working on a home automation project and we've been looking for an OSS, linux-compatible speech rec system, but it seemed like every single Linux speech project died in the early 2000s when IBM sold their freeware ViaVoice system and the new company started charging for it. Seems like every single Linux project used it as the backend. The only other option was CMU's Sphinx work which looked impressive but almost impossible for non-speech-experts to use directly. This will be really cool to try out - k
CMU Sphinx, an other free speech recognizer (Score:3, Informative)

by TorKlingberg ( 599697 ) writes: on Saturday January 19, 2008 @05:59PM (#22112858)

There is also CMU Sphinx, which is completely free (no HTK used) and very good quality.
http://cmusphinx.sourceforge.net/ [sourceforge.net]
http://en.wikipedia.org/wiki/CMU_Sphinx [wikipedia.org]

Share
twitter facebook
Open Source Chinese Speech Recognition? (Score:2)

by ZorroXXX ( 610877 ) writes:

Hi. I am currently learning Chinese and when reading this I thought that maybe speech recognition software could be useful (or maybe not, but at least I would like to try). Does anyone have any tips on what I need to get of software (for Linux) that supports recognition of Chinese?
I am not interested in learning the computer to recognize my terrible pronunciation, but rather to have some program expect to hear standard Chinese which I could practice with.
One extremely useful program I have found whic
- Re: (Score:1)
  
  by ben(zen) ( 1162093 ) writes:
  
  Probably once it's in beta. Seriously, I'm not sure if I want Microsoft running any part of my car.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

been playing with it (Score:5, Interesting)

Re:been playing with it (Score:5, Insightful)

Re: (Score:2, Funny)

Re:been playing with it (Score:5, Funny)

Re: (Score:3, Funny)

Re:been playing with it (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2, Interesting)

Re:been playing with it (Score:5, Funny)

Re: (Score:2)

I could see how it could be useful in some apps (Score:2)

Re: (Score:1)

Mask out known audio? (Score:2)

Re: (Score:2)

Re: (Score:2)

Are they productive? (Score:4, Insightful)

Re: (Score:2, Insightful)

Re: (Score:2, Funny)

Re: (Score:1)

Re:Are they productive? (Score:4, Insightful)

Re: (Score:1)

Re: (Score:1)

Re:Are they productive? (Score:5, Funny)

For those not familiar with this meme (Score:4, Informative)

War stories.... (Score:2)

Re:Are they productive? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I use only computer dictation for medical notes (Score:3, Informative)

Re: (Score:1)

Re: (Score:1, Insightful)

This is not about dictation software (Score:5, Interesting)

Re: (Score:2)

Double the Killer (Score:2)

Open Source, or Microsoft-Owned? (Score:5, Interesting)

Re:Open Source, or Microsoft-Owned? (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

WHOA! "Open Source"="can look at code"!!?? WTF? (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Which languages are supported? (Score:4, Insightful)

Re: (Score:1)

Re: (Score:2)

Re:Which languages are supported? (Score:5, Informative)

Open Source? (Score:2, Insightful)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Aisle of it (Score:5, Funny)

Re: (Score:3, Funny)

Re: (Score:2)

Wiktionary != Wikipedia (Score:4, Interesting)

Re: (Score:1, Informative)

Re: (Score:2)

Re:Wiktionary != Wikipedia (Score:4, Funny)

Re: (Score:2)

Pedant's Revolt (Score:4, Informative)

Re: (Score:2)

Uses in Telephony (Score:2, Interesting)

work together? (Score:1)

Re: (Score:1)

Project's webpage in English? (Score:2)

Re:Project's webpage in English? (Score:5, Informative)