Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

[ Create a new account ]

Open Source Speech Recognition

Posted by CmdrTaco on Saturday January 19, @11:14AM
from the hello-computer dept.
bedahr writes "The first version of the open source speech recognition suite simon was released. It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model. These components are united under an easy-to-use graphical user interface. Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts. It also provides means to train the language model with new samples and add new words."

Related Stories

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Open Source Speech Recognition 50 Comments More | Login | Reply /

 Full
 Abbreviated
 Hidden
More | Login | Reply
Keybindings Beta
Q W E
A S D
Loading ... Please wait.
  • been playing with it (Score:5, Interesting)

    by primadd (1215814) on Saturday January 19, @11:20AM (#22109222)
    I did use julius for a small project utilizing voice recognition once. While not perfect I was quite impressed by the results of the engine. Quite fun to control the light and TV with shout commands, thought once or twice a movie actually triggered "lights off"

    --
    webmasters: personalized bookmarking [primadd.net] [primadd.net] scripts for your site
    wp and phpbb plugin available
    • Re:been playing with it (Score:5, Insightful)

      by Anonymous Coward on Saturday January 19, @11:23AM (#22109254)
      You might want to do what they do in Star Treck and put a word infront of every command. Something like "Computer: Lights off" will reduce the chance that some random sentences from the TV will trigger the command. Unless you're watching Star Treck ofcourse.
      • Re: (Score:3, Informative)

        This is actually the simon approach does: the magic keyword is "simon". "simon Firefox" for example. -- bedahr
        • Re: (Score:2)

          Although, of course, "prikazyvat" actually means "to order", so wouldn't make much sense. I prefer the HK-47 approach, using the noun form "prikaz", meaning "order".

          Prikaz: Start Firefox.
          Prikaz: Open new tab.
          Zamyechaniye: It would be quicker to use a mouse
    • Re:been playing with it (Score:5, Funny)

      by Anonymous Coward on Saturday January 19, @12:02PM (#22109614)
      Not perfect? Like, if you say "Open the pod bay doors, HAL," it'll say "I'm sorry, Dave, but I can't seem to do that," and try to kill you (even though your name is Steve)?
    • Say for remotely controlling say the TV or something .. instead of having to remember the channel number you could just say "TV (or other trigger word), Discovery channel". I guess combined with an LCD/OLED button remote it could be used. Also, on a phone
    • Well, if your TV is being controlled by the same computer (e.g., MythTV), then shouldn't the voice command be able to mask out anything the microphone picks up that matches the output sound? If there software to filter audio input to filter out what is cu
    • Re: (Score:2)

      Of course, you should use "illuminate", in which case only one movie will trigger the light switch :)
    • Re: (Score:2)

      Carnegie Mellon open-source sphinx years ago: http://cmusphinx.sourceforge.net/html/cmusphinx.php [sourceforge.net]

      It's a speaker-independent, continuous speech recognizer that can be configured to do everything from simple commands to full-text dictation. It's not Dragon'
        • Re: (Score:2, Interesting)

          Actually you don't need to get your hands dirty for writing your own grammar. Simon includes a complete grammar module with ways to compile the grammar, edit the sentence structures, import them from written texts (by looking the words up in the dictionary
  • Are they productive? (Score:4, Insightful)

    by bogaboga (793279) on Saturday January 19, @11:31AM (#22109328)
    In my experience, I have not found speech recognition engines/software that productive. Too many errors and a slow [and steep] "learning" curve for the engine. I will have to be convinced that this simon thing is any different for me to give it a spin.
    • Re: (Score:2, Insightful)

      by Anonymous Coward
      No doubt you're correct, but it's got to be a boost for anybody who cannot type effectively.
      • Re: (Score:2, Funny)

        by Anonymous Coward
        Hey! Let's leave slashdot out of this.
    • Re:Are they productive? (Score:4, Insightful)

      by Yvanhoe (564877) on Saturday January 19, @11:50AM (#22109490) Journal
      I doubt that speech recognition is ready to be used as an alternative to keyboards to type text, but I think it can become, after the keyboard and the mouse, a third input device that would boost the productivity of a computer user.
    • Re:Are they productive? (Score:5, Funny)

      by Sanat (702) on Saturday January 19, @12:06PM (#22109670)
      Dear aunt,let's set so double the killer delete select all

    • Re:Are they productive? (Score:5, Insightful)

      by Instine (963303) on Saturday January 19, @12:07PM (#22109680) Homepage
      Nearly five years ago I used to help a guy who had no useful movement in his limbs. He could use a mouth stick to type and control the cursor. However he also used Dragon Dictate. His machine was old 7 years ago, and here's the amazing bit (to me at least) his speech was pretty garbled from his condition. Most humans found it very hard understanding him, yet the dictation software did a pretty good job. He wrote an entire screen play (later comitioned by the BBC) and was a lawyer with his own practice (it may sound like it but I'm not making this up). His success with this tech was probably what got me into assitive tech (now my job).

      So depends who you are on how much it improves you productivity.
      • Re: (Score:2)

        that should read nearly 7 years ago. And no I'm not using speech rec. Just typing quickly and badly ;P
      • Re: (Score:2)

        "So depends who you are on how much it improves you productivity."

        The biggest problem with text to speech is simply having to train the engine, I found Dragon Naturally speaking 9 not too bad, it's training it to recognize your own unique vocalizations tha
        • Re: (Score:3, Informative)

          You might want to have a look at the voxforge project [voxforge.org]

          And this doesn't require changes in the algorithm - just in the model.

          -- bedahr

      • At my office, we use a computer dictation system for medical notes. It is amazingly accurate for those who speak with accents within the norm. It works well for me, and I will typically dictate something like this:

        "The patient presents today with three c
    • This is not about dictation software (Score:5, Interesting)

      by idji (984038) on Saturday January 19, @03:05PM (#22111352)
      Many people think that "Speech recognition software" = "dictation software" - as is clear from many comments here. That is not simply the case. Dictation is just one application of speech recognition - and a personal application at that - which is the only thing most people come across. Other applications are media transcription (closed captioning), media mining "What did Obama say about the prime mortgage market this week?", telephone call center controlling (Are our staff using naughty words? Is the customer using aggressive language?), telephone call mining ("bomb", "anthrax", ...), indexing vast audio archives of news broadcasts (keyword/topic tagging), aligning audio to human transcription (documentaries, DVD subtitles, witness testimonies, court or parliament proceedings - think of any event that is transcribed like UN conferences), etc. Don't you think CNN, BBC or any national film archive would be interested in searching through there millions of hours of recorded footage? Now you tell me - do you think that the holy grail of speech recognition is "HAL - please close the hatch", "Dear Mom, we are having a lovely time here..." or hearing any TV show in any language you want, or calling anyone in the world and being able to talk to them in your own language? Dictation Software is about the only speech-reco application that can be sold to the masses - all the rest is still fairly much below the horizon...
  • Cue the obligatory lets set so double the killer delete select all. :)
    • Open Source, or Microsoft-Owned? (Score:5, Interesting)

      by kripkenstein (913150) on Saturday January 19, @01:00PM (#22110218)

      Cue the obligatory lets set so double the killer delete select all. :)
      Speaking of Microsoft, according to HTK's FAQ:

      HTK was originally developed at the Cambridge University Engineering Department (CUED). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research Laboratory Ltd was established. HTK was sold by Entropic until 1999 when Microsoft bought Entropic. Microsoft has now licensed HTK back to CUED and is providing support so that CUED can redistribute HTK and provide development support via the HTK3 web site. [...] Microsoft retains the copyright to the existing HTK code
      [...]
      you are not allowed to redistribute (parts of) HTK3
      In other words, HTK - a critical part of the 'Simon' project - is owned by Microsoft. It is also not under a FOSS license: you can look at the code and use it for your own purposes, but you can't redistribute it. In fact, reading this, I wonder if Simon is not in violation of the license.
      • Re:Open Source, or Microsoft-Owned? (Score:5, Informative)

        by bedahr (1222520) on Saturday January 19, @01:12PM (#22110366) Homepage
        Simon is in no way connected to Microsoft.

        Simon does NOT contain the HTK toolkit - it meerly executes commands.

        HTK is free of charge and open source (in the strict sense of you-can-look-at-the-code). It is, however, not "free".

        We are aware of that and have not packaged any parts of HTK for the release - you have to download it yourself if you want to modify the model from within simon.

        It is not optimal, but we don't have the knowledge and / or manpower to code up something similar in a reasonable timeframe. And after all, it isn't that big of a deal, is it?

        -- bedahr
    • Re: (Score:2)

      Cue the obligatory lets set so double the killer delete select all.
      Hey! It's hard to wreck a nice beach!
  • Which languages are supported? (Score:4, Insightful)

    by r_jensen11 (598210) on Saturday January 19, @11:47AM (#22109456)
    That's great and all, but which languages are supported? I hope it's more than just English
    • Re: (Score:2)

      I hope it's more than just Lojban [wikipedia.org] grammar with English words.
    • Re:Which languages are supported? (Score:5, Informative)

      by R.Mo_Robert (737913) on Saturday January 19, @03:05PM (#22111358)

      If you follow the link to the Sourceforge project and look at any of the screenshots (including the one on the front page--at the time when I visited it, anyway), you'll see that they're actually training the software with German. So, it looks like the answer to your question is, yes, it supports more than English.

  • Open Source? (Score:2, Insightful)

    If this is the first, what was Sphinx [sourceforge.net] then?
  • Aisle of it (Score:5, Funny)

    by ZeroFactorial (1025676) on Saturday January 19, @11:50AM (#22109488)
    Eye musing i trite now two poster slashed hot. It saw grate pro gram!
  • Wiktionary != Wikipedia (Score:4, Interesting)

    by Anonymous Coward on Saturday January 19, @11:51AM (#22109500)
    Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.

    I would've expected that kind of sloppiness on the Register, but not on Slashdot (yeah, I know, I must be new here...)
    • Re:Wiktionary != Wikipedia (Score:4, Funny)

      by kryten_nl (863119) on Saturday January 19, @01:09PM (#22110342)

      Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.[Citation needed]
      • Re: (Score:2)

        No need to be overly pedantic.

        Hey this is slashdot... pedantry is the base of most of the discussions here...

        you must be new here uh?
  • Pedant's Revolt (Score:4, Informative)

    Simon can import dictionaries directly from wiktionary (a subproject of wikipedia)

    No it's not - Wiktionary is a sister project of Wikipedia. Not a subproject.

    However, I must concur that in my experience speech recognition has been extremely patchy. While using it to issue voice commands is OK (and can be a real time-saver as it avoids going into Start, /Applications, Programs menu etc), dictation tends to be pretty rubbish. Especially when you're demonstrating the new speech recognition abilities in Windows Vista and just happen to work for Microsoft. And be in a loud, echoey expo hall. And using a dodgy mike.

  • Uses in Telephony (Score:2, Interesting)

    by Anonymous Coward
    This could be very useful in projects like FreeSWITCH which is an Open Source project for building telephony applications. More info at http://www.freeswitch.org/ [freeswitch.org]
  • Trying to learn more about it, I followed the project's website link on the sourceforge page to simon-listens.org [simon-listens.org], but it's german only, found no english (or other language) info. Anyone has an advice?
  • Whither Microsoft? (Score:3, Insightful)

    by IGnatius T Foobar (4328) on Saturday January 19, @12:44PM (#22110068) Homepage Journal
    Offices full of people talking to their computers has been Bill Gates' wet dream for decades now. What will happen if open source gets there first?

    Actually, the reason we're not there yet is because most people don't want it. Keyboards and mice are simply a better way to give instructions to your computer than speech recognition is. Could you imagine the clatter of a dozen or more people in close proximity chattering to their computers?
    • Re: (Score:2)

      Hmmm, from the http://htk.eng.cam.ac.uk/ [cam.ac.uk] site

      "HTK was originally developed at the Machine Intelligence Laboratory (formerly known as the Speech Vision and Robotics Group) of the Cambridge University Engineering Department (CUED) where it has been used to
    • Re: (Score:2)

      Wither, Microsoft.
    • Re: (Score:2)

      That's really true for the office environment, but I would love voice recognition at home - while in the kitchen cooking, in the bath ( no - don't comment on that one), crashed on the couch. Or even in the car.
    • Re: (Score:2)

      Actually, the reason we're not there yet is because most people don't want it.
      I think you're wrong in this point. The reason we are not there yet is not because of demand. It's because the technology isn't quite good enough yet. It's getting very close
  • filthy open-source (Score:4, Informative)

    by jumbolo (75644) on Saturday January 19, @02:06PM (#22110810)
    simon is open source.
    julius is open source.
    htk is *NOT* open source.

    The latter is a micro$oft by-product, as clearly shown by the license [cam.ac.uk] that you have to first agree with and then send your email to them in order to download the tarballs...

    myself never done this since 1995.
  • by TorKlingberg (599697) on Saturday January 19, @05:59PM (#22112858)
    There is also CMU Sphinx, which is completely free (no HTK used) and very good quality.
    http://cmusphinx.sourceforge.net/ [sourceforge.net]
    http://en.wikipedia.org/wiki/CMU_Sphinx [wikipedia.org]