Stories
Slash Boxes
Comments

News for nerds, stuff that matters

IBM Strives For 'Superhuman' Speech Tech

Posted by ScuttleMonkey on Wed Jan 25, 2006 04:34 AM
from the fansubbing-in-jeopardy dept.
robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Which ... (Score:4, Interesting)

    by spiny (87740) on Wednesday January 25 2006, @04:36AM (#14555797)
    (http://www.atari.st/ | Last Journal: Thursday April 27 2006, @05:27AM)
    Which witch blew the blue candle out ?
    • Re:Which ... by jakeweston (Score:3) Wednesday January 25 2006, @04:46AM
    • Re:Which ... by cs02rm0 (Score:2) Wednesday January 25 2006, @04:48AM
      • 1 reply beneath your current threshold.
    • Re:Which ... by lahvak (Score:2) Wednesday January 25 2006, @04:49AM
      • Re:Which ... by prSpectiv2 (Score:1) Wednesday January 25 2006, @05:12AM
        • Re:Which ... by soulctcher (Score:1) Wednesday January 25 2006, @05:25AM
      • Re:Which ... by paedobear (Score:1) Wednesday January 25 2006, @05:32AM
        • Re:Which ... by Helios1182 (Score:2) Wednesday January 25 2006, @10:14AM
        • Re:Which ... by kryonD (Score:2) Wednesday January 25 2006, @12:34PM
          • Re:Which ... by Mattcelt (Score:2) Wednesday January 25 2006, @05:35PM
          • Re:Which ... by Sarisar (Score:1) Thursday January 26 2006, @03:25AM
        • Re:Which ... by lahvak (Score:2) Wednesday January 25 2006, @02:18PM
      • Re:Which ... by mehu (Score:1) Wednesday January 25 2006, @04:04PM
        • Re:Which ... by mehu (Score:1) Thursday January 26 2006, @06:53AM
        • 1 reply beneath your current threshold.
      • 1 reply beneath your current threshold.
    • Re:Which ... (Score:5, Interesting)

      by jcupitt65 (68879) on Wednesday January 25 2006, @05:41AM (#14556023)
      Or I can wreck a nice beach versus I can recognise speech.

      Sometimes you need rather a large context to disambiguate: is this sentence part of a discussion on shore-front management, or spoken language understanding?

      [ Parent ]
      • Re:Which ... by FirienFirien (Score:2) Wednesday January 25 2006, @07:33AM
        • Re:Which ... by mwood (Score:3) Wednesday January 25 2006, @08:59AM
        • 1 reply beneath your current threshold.
      • Re:Which ... by RossumsChild (Score:1) Wednesday January 25 2006, @08:37AM
        • Re:Which ... by Squalish (Score:2) Wednesday January 25 2006, @11:18AM
      • 1 reply beneath your current threshold.
    • Fantastic direction by Simonetta (Score:2) Wednesday January 25 2006, @09:35AM
    • Re:Which ... by The Spoonman (Score:2) Wednesday January 25 2006, @10:03AM
    • Re:Which ... by Anonymous Coward (Score:1) Wednesday January 25 2006, @01:37PM
    • 3 replies beneath your current threshold.
  • Coherency? (Score:5, Insightful)

    by PrinceAshitaka (562972) * on Wednesday January 25 2006, @04:38AM (#14555810)
    (http://www.euvsus.blogspot.com/)
    From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"

    Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.

    I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.
    • Re:Coherency? (Score:4, Interesting)

      by Yahweh Doesn't Exist (906833) on Wednesday January 25 2006, @04:48AM (#14555849)
      yes, there will always be delay for the reason you state. but that's true even with human translators, yet no-one claims real-time meetings between people via translators is a waste of time.

      since even "live" boradcasts are usually delayed several minutes for technical and legal reasons anyway, if this technology can get to the state where you're just one or two sentences behind real-life it will be effectively real-time anyway for almost all practical purposes.
      [ Parent ]
    • Re:Coherency? by grimJester (Score:2) Wednesday January 25 2006, @05:16AM
    • Re:Coherency? by wizrd_nml (Score:2) Wednesday January 25 2006, @05:37AM
    • Re:Coherency? by dancallaghan (Score:3) Wednesday January 25 2006, @05:40AM
    • And German is an easy one (Score:5, Informative)

      by Ogemaniac (841129) on Wednesday January 25 2006, @05:44AM (#14556030)
      It is as closer to English as any other language. In general, European languages have the same basics as English (such as "the") and are fairly easy to learn and translate. Right now I live in Japan, where the language and its underlying way of thinking basically run in the reverse direction of English. To translate, you are essentially running the whole thing backwards. Worse yet, the fundamental parts of the language are quite different. For example, Japanese does not have articles or prepositions, though it has post-positions that roughly correspond. However, there are fewer of them, so they have "lots of meanings" when translated into English. Translation can be a "#$#, even for a human who understands both languages very well (which is why anime comes off so corny sometimes). There are countless times where there is just no simple way to express a thought in one language that is trivial in the other.
      [ Parent ]
    • Re:Coherency? by MaxiumMahem (Score:1) Wednesday January 25 2006, @05:59AM
      • Re:Coherency? by somersault (Score:1) Wednesday January 25 2006, @09:11AM
    • Re:Coherency? by kklein (Score:1) Wednesday January 25 2006, @07:07AM
      • Re:Coherency? by Anpheus (Score:1) Wednesday January 25 2006, @07:53AM
      • Re:Coherency? by somersault (Score:1) Wednesday January 25 2006, @09:17AM
        • Re:Coherency? by kklein (Score:1) Wednesday January 25 2006, @06:26PM
          • Re:Coherency? by somersault (Score:1) Thursday January 26 2006, @06:19AM
    • Re:Coherency? by vertinox (Score:2) Wednesday January 25 2006, @09:40AM
    • Re:Coherency? by foo fighter (Score:2) Wednesday January 25 2006, @10:13AM
    • Re:Coherency? by Guspaz (Score:2) Wednesday January 25 2006, @01:15PM
    • Re:Coherency? by Fratz (Score:2) Wednesday January 25 2006, @01:39PM
    • Re:Coherency? by Fruit (Score:1) Wednesday January 25 2006, @06:48AM
    • 5 replies beneath your current threshold.
  • first? (Score:5, Funny)

    by Anonymous Coward on Wednesday January 25 2006, @04:39AM (#14555811)
    however the researchers stated "We still can't figure out what Bob Dylan is saying"
    • Re:first? by Orgazmus (Score:2) Wednesday January 25 2006, @04:42AM
      • 1 reply beneath your current threshold.
    • Re:first? by Mr. Bad Example (Score:2) Wednesday January 25 2006, @10:32AM
    • Re:first? by bobdylan (Score:1) Wednesday January 25 2006, @12:35PM
  • by themysteryman73 (771100) on Wednesday January 25 2006, @04:39AM (#14555816)
    Reminds me of a Simpsons episode "Hello Homer, it's me, KITT from Knight Rider"

    Seriously though, this is a great advance in technology, but will it still be as funny to listen to? It's always fun typing in words into speech recognition programs and listening to the unexpected results!

  • Nuances (Score:4, Funny)

    by AnonymousYellowBelly (913452) on Wednesday January 25 2006, @04:43AM (#14555834)
    GB on TV: "We have prevailed"
    Subtitle: "All your base are belongs to us"
    • Re:Nuances by (arg!)Styopa (Score:2) Wednesday January 25 2006, @09:10AM
    • 1 reply beneath your current threshold.
  • NSA Babelfish (Score:2, Funny)

    by Elixon (832904) on Wednesday January 25 2006, @04:44AM (#14555837)
    (http://www.webdevelopers.cz/)
    I cannot wait when I buty the first eBabelfish gadget that I will put in my ear so I can understand spoken language of my russian colegues... ;-) :-) I hope that someobody will not consider it as "important technology for the national security" and will not restrict it by any mean...

    (I'm sure that this eBabelfish is already installed - not in my ear - but on the telecommunication centers...)
  • Opensource? (Score:1, Interesting)

    by Anonymous Coward on Wednesday January 25 2006, @04:45AM (#14555841)
    Will IBM make this technology public or will it be proprietary?
  • Foreign languages are complex... (Score:5, Insightful)

    by pubjames (468013) on Wednesday January 25 2006, @04:52AM (#14555857)
    I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.

    It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.

    This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.

    I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.
  • Ghee... (Score:4, Insightful)

    by Anonymous Coward on Wednesday January 25 2006, @04:54AM (#14555864)
    Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?
    • Re:Ghee... by forgotten_my_nick (Score:2) Wednesday January 25 2006, @07:21AM
      • Re:Ghee... by amliebsch (Score:2) Wednesday January 25 2006, @09:35AM
      • Re:Ghee... by benjamindees (Score:1) Wednesday January 25 2006, @09:16PM
    • Re:Ghee... by SchwarzeReiter (Score:2) Wednesday January 25 2006, @11:22AM
  • by Viol8 (599362) on Wednesday January 25 2006, @04:56AM (#14555872)
    ...they should send it to Glasgow on a saturday night just after the pubs
    have closed.

    "Ye loooiii ahhh me jimmeh??! *belch* C'mere ya wee electrahnich bastid, I'll
    shoo ye!"
  • by yamum (893083) on Wednesday January 25 2006, @04:58AM (#14555879)
    ViaVoice was shipped with an older version of Mandrake Linux.

    Anyone know where I can get this from?
  • It isn't worth it (Score:5, Funny)

    by YearOfTheDragon (527417) on Wednesday January 25 2006, @05:00AM (#14555893)
    (http://www.vidaartificial.com/)
    May be IBM is going to make speech recognition true, but Bill Gates said that this was posible a long time ago [mpt.net.nz]. Simply genius.
  • On-The-Fly (Score:5, Informative)

    by Trurl's Machine (651488) on Wednesday January 25 2006, @05:02AM (#14555901)
    (Last Journal: Wednesday February 26 2003, @06:32AM)
    They really do it on the fly? You mean, [on the surface of] [a particular] [insect of a Musca domestica species]?

    I have read a lot of auto-translated documents and it is always a good laughter in terms of "crapslation cabaret". So far, there is no technology that could auto-translate a text document succesfully. The "80% success" is a myth - they just count how many words were found in the vocabulary, not how many of them were put into a good context. A "fly" translated as an insect would be accounted as a success!

    Even if you are not a bot but a human being with some knowledge of the other language and culture, it's very easy to involuntary offend someone or just to make a ridiculous faux-pas. Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-translate this...
    • Re:On-The-Fly by coofercat (Score:1) Wednesday January 25 2006, @05:39AM
      • Re:On-The-Fly by Red Alastor (Score:3) Wednesday January 25 2006, @06:48AM
    • Re:On-The-Fly by Aceticon (Score:2) Wednesday January 25 2006, @06:34AM
      • Re:On-The-Fly by Bohiti (Score:1) Wednesday January 25 2006, @12:11PM
    • Re:On-The-Fly by blackest_k (Score:2) Wednesday January 25 2006, @07:11AM
    • Re:On-The-Fly by Cro Magnon (Score:2) Wednesday January 25 2006, @08:56AM
    • Complexity by ratboy666 (Score:2) Wednesday January 25 2006, @11:43AM
    • Re:On-The-Fly by Deluge (Score:2) Wednesday January 25 2006, @02:45PM
      • Re:On-The-Fly by Trurl's Machine (Score:2) Wednesday January 25 2006, @04:33PM
    • 1 reply beneath your current threshold.
  • IBM and Google cooperation to come? (Score:3, Interesting)

    by Mostly a lurker (634878) on Wednesday January 25 2006, @05:13AM (#14555934)
    IBM has been one of the pioneers in speech recognition for a long time. However, indications are that Google (in the lab) [slashdot.org] has been making tremendous progress in translation. While the two companies are bound to be fierce competitors, it would seem they would both have much to gain from cooperation in the area of language recognition and translation.
  • by thbb (200684) on Wednesday January 25 2006, @05:16AM (#14555949)
    (http://highc.org/)
    As it has been the case for the past thirty years, the description of the prowesses of the system are still written in the conditional form: "...IBM technology can be used to control computers and devices..." rather than the active form: "is being used"...

    Ben Shneiderman is the person who, in my opinion, articulates the best the limits of speech recognition [umd.edu].

    One of my favorite phrases to explain this issue is: "You don't want to speak to a computer, because you can't speak and think at the same time". More precisely, speech utterance makes use of some modules in our brain which are required for planification too. Hence, you can't plan as well what to do next when you speak, which is a big hurdle in the type of intellectual activities one carries with a computer.
  • Awful default TTS (Score:4, Insightful)

    by Council (514577) <rmunroe.gmail@com> on Wednesday January 25 2006, @05:19AM (#14555957)
    (http://xkcd.com/)
    Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.

    What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.

    This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.

    I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.
  • by shotgunefx (239460) on Wednesday January 25 2006, @05:21AM (#14555964)
    (Last Journal: Thursday November 09 2006, @10:31AM)
    Serious, you hear how some people "talk" these days?
  • ViaVoice (Score:1)

    by TheRealDamion (209415) on Wednesday January 25 2006, @05:27AM (#14555979)
    (http://trap.me.uk/damion/)
    The xvoice team have failed to get IBM to recompile newer ViaVoice libraries, or even the same code against a more modern libc, ld.so and gcc environment making it quite hard to keep it working on newer distributions. It's also limited to ia32. They certainly don't seem likely to release the source code.

    So I'm surprised to see an announcement like this one.
  • American or English? (Score:3, Interesting)

    by squoozer (730327) on Wednesday January 25 2006, @05:30AM (#14555989)
    (http://www.crazysquirrel.com/index.jspx)

    I realize that Anericans and British (English at least ;o)) speak essentially the same language but I have yet to find any speech recognition software that can get more than roughly 85% of what I say correct. I have a fairly soft neutral english accent with pretty good enunciation so I would have expectd to be getting a recognition rate in the high 90%s. I'm wondering if, as most of this software is developed in the US, it is tuned specifically to pick up on english with a US accent? I realize that you train the software for your voice but AIUI all you are doing is tuning a basic speech model. Has anyone else had this problem or is it just me?

  • Oh oh oh. (Score:3, Funny)

    by Anonymous Coward on Wednesday January 25 2006, @05:33AM (#14556003)
    I think it was about 1996 or maybe 1997 when I attended an IBM demonstration (for retailers) for its speech recognition software. Anyway, the lady who was narrating the text and. talking. like. a. robot. to. do. it. was half-way through when, for no apparent reason, the word uterus appeared in the text.

    So I'm sitting here thinking of how funny it was to the juvenile me back then, and how unfunny it seems right now. Oh well.
  • Not _that_ amazing (Score:2, Interesting)

    by johndoe42 (179131) on Wednesday January 25 2006, @05:42AM (#14556026)
    It's been well-known among language researchers that both speech recognition and parsing/comprehension are much easier when applied to a small problem domain. SRI in Palo Alto and CSLI at Stanford, for example, have a number of very impressive speech recognition packages that understand, for example, medicine-related sentences. The dashboard controls just sound like a logical progression of this to faster computers and an even smaller problem domain. They're cool nonetheless.

    The translation, on the other hand, sounds damned impressive. For unrestricted content, especially with an untrained voice (I imagine that IBM isn't individually training to each Al Jazeera talking head), 70% recognition sounds quite good. 70% accuracy post-translation ought to be quite a bit better than what's currently out there. The description of MASTOR, however, is useless -- it could easily describe anything that isn't word-for-word translation.
  • Buyer beware (Score:5, Insightful)

    by 99luftballon (838486) on Wednesday January 25 2006, @06:04AM (#14556085)
    Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?

    Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.

    So don't rush to buy. Let the labs check it out first.
  • Trusted Computing (Score:1)

    by The New Andy (873493) on Wednesday January 25 2006, @06:15AM (#14556120)
    (Last Journal: Tuesday April 12 2005, @07:06AM)
    This is one of those things that won't be possible with trusted computing. With encrypted audio+video streams for everything, all these cool technologies won't be able to be made. Hopefully, someone makes a program like this which goes mainstream - that ought to educate people about trusted computing as soon as they try to sneak it in.
  • by el_womble (779715) on Wednesday January 25 2006, @06:23AM (#14556147)
    (http://marshonsmacs.blogspot.com/)
    it does what the current generation of speech recognition claims to do. I have yet to find any dictation software that is even remotely accurate, and the voice command software has been pap, at least for me. There is something about my accent that really upsets speech recogntion software.

    Nintendogs: I've stopped trying to train my dog, its never going to happen.
    Apple Speech: Only works if I use a terrible californian accent. Not worth the embarresment.
    Nokia: Even with just one voice command, my girlfriends name, if still can't match my voice.

    If this can translate foreign languages in to American (sic) then it definately sounds like it could stand a chance at translating English into text and command.
  • funny this subject should come up... (Score:2, Interesting)

    by dafragsta (577711) on Wednesday January 25 2006, @07:04AM (#14556257)
    I've actually never used any speech recognition software before today. That said, today just happens to be the day. That said, I tried out Dragon NaturallySpeaking for the first time, and it is a complete coincidence that this topic should come up. I'm actually dictating this post with Dragon, as we speak. ha ha

    the training process definitely has its ups and downs. The more you work with it however, the more it becomes attenuated to your own speech patterns and moreover, the quirky words we use every day. If you can get past the first two or three hours, you'll see that it is totally worth the effort, especially if this IBM tech isn't available to end-users for some time. There is also an aspect of the software training you, while you train the software. At the present time, I can dictate to slightly slower than I can probably type.

    In the end, I can see where this would make a writing e-mails and other such time-consuming tasks, which involve spellchecking, grammar, and other proof reading significantly quicker. When you really hit your stride, it's easy to write at the speed of thought, which is really appealing. There are caveats, however. it's very easy to dictate several sentences worth of tax and taken for granted that it to everything down the way you attendedselect tax select select tax undo
  • I am surprised (Score:1)

    by Shar-Kali-Sharri (890290) on Wednesday January 25 2006, @07:08AM (#14556275)
    ... how critical people have been in their replies 'till now. I mean sure there are bound to be problems with this tech, but I think what's really interesting is the implications of a mostly succesful on-the-fly translation, - babblefish anyone... Supposedly with fast enough computers and advanced enough programs - imagine being able to commicate with everyone in the whole **cking world.... This would have enormous consequences for everything... humanity unite - (or problably bloody warfare ...). It might be true that this would problably remove some peoples motivation for learning other languages... but if look at the world today, there are quite a lot of bi-lingual people, but how many tri-lingual and in extreme consequence of this tech - 500-lingual.... You could potentially communicate with bloody QuEthc-indians..... This is what I think is the real issue here - not that some subtitles might miss a joke....
    • Re:I am surprised by Dark_MadMax666 (Score:1) Wednesday January 25 2006, @12:40PM
    • 1 reply beneath your current threshold.
  • Real-time eavesdropping (Score:2, Interesting)

    by 0xC2 (896799) on Wednesday January 25 2006, @07:30AM (#14556373)
    (http://hiranyaloka.com/)
    Although most of the discussion so far has focused on foreign language translation, this technology is about *real-time-audio-to-text* conversion. The feds will be able to monitor, analyze, and record our conversations in real time:

    Monitor all conversation.
    Apply real-time text filters.
    Assign live agents to priority eavesdropping.
    Profit!

    If you could apply a filter to listen in to any call what would it be?
  • Finally! (Score:2)

    by digitaldc (879047) on Wednesday January 25 2006, @07:35AM (#14556397)
    We can figure out just what the hell Ozzy Osbourne is saying!
  • Translating Arab TV (Score:3, Informative)

    by Perl-Pusher (555592) on Wednesday January 25 2006, @07:56AM (#14556481)
    I imagine it is easier to translate repetitive phrases such as "The zionist oppresssor shall be eliminated", "The great Satan America will be destroyed" and "Our martyrs have struck fear in the hearts of the infidels ".

    I was in Kuwait and watched arab TV with english subtitles, it was enlightening to say the least. One long tribute to racism paid for by the Amir of Quatar. Only on arab TV will you see such trash as "the jews are descended from pigs".

    • 1 reply beneath your current threshold.
  • by Fear the Clam (230933) on Wednesday January 25 2006, @08:05AM (#14556523)
    One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles.

    10 PRINT "DEATH TO AMERICA";
    20 GOTO 10

    RUN
  • by ian_mackereth (889101) on Wednesday January 25 2006, @08:36AM (#14556719)
    (Last Journal: Monday September 04 2006, @10:07PM)
    Is it really that hard to understand Chris or George Reeves saying "Up, up and awaaaayyy!"?
  • So I think there should be a program to resynthesize the "learned" words into the most exact average of any given way to say it. I'd love to hear the results, that would be fascinating.

  • Excellent Product, Confused Reviewers (Score:2, Informative)

    by MarsGov (300325) on Wednesday January 25 2006, @08:44AM (#14556790)
    ViaVoice Embedded, the product that they're releasing, works on limited-domain problems: for example, tasks related to control of your car's peripherals. When the vocabulary and grammars are constrained it's possible to acheive very decent accuracy.

    Dictation, however, is a completely different problem. There are far fewer constraints on what can be said, and the system makes errors as it picks through the possible choices. As a result, most dictation software requires training: the system will use your voice to train its recognition models to improve its word selection. Dictation systems also ask for samples of your documents to train its language models on how you put words together; that also helps determine the probabiity of proper word choice. (Example of how you put words together: "Peanut butter sandwich" is a much more likely choice than "peanut butter sand," and will get a higher score.)

    The IBM announcement is about embedded, task-oriented speech recognition. It's not "superhuman," according to the article's text and ignoring its headline. I'll have an opportunity to see it in action next week at SpeechTek West [speechtek.com]. Expect to see other product announcements about speech technology in the next few days as the conference approaches.

    As for the TV translation software, it's still in the research stage according to the article. I've seen BBN's version of this software, and frankly it's amazing how good real-time translation can be.

    Bell Canada deployed Emily [speechtechmag.com] a few years back, and the results to date have been excellent. A top-level question of "How can I help you?" replaces several layers of DTMF auto-attendant complexity.

    If you're interested in trying speech recognition and text-to-speech out for yourself, you can use Voxeo's servers, program in VoiceXML, and my Voice Conference Manager [sourceforge.net] app as a starting point (yeah, VCM needs a new release, and it's getting one soon).
  • by ijablokov (225328) on Wednesday January 25 2006, @08:54AM (#14556870)
    (http://www.yapme.com/)
    ...our speech-enabled Web browsers for mobile devices and set top boxes. More info on them here: http://ibm.com/pvc/multimodal [ibm.com]

    Not only do they allow you to navigate by voice, but using X+V (a blend of XHTML and VoiceXML), you could have fully speech-enabled Web apps. Example: "show me nearby sushi restaurants" or "movie schedules in my area".

    We also released our Multimodal Tools Project for Eclipse a couple weeks ago: http://alphaworks.ibm.com/tech/mmtp [ibm.com]

    Go ahead and play. ;-)
    • 1 reply beneath your current threshold.
  • Let's see it translate poems (Score:3, Interesting)

    by roman_mir (125474) on Wednesday January 25 2006, @08:57AM (#14556900)
    (http://booktextmark.mozdev.org/)
    When and if it can translate poems [slashdot.org] from language to language, while keeping the style, the nuances, the rythm, the cultural references, the general idea and the details, then we will know - it is done. Until then, don't hold your breath.
  • Anime fansubs! (Score:2)

    by CptNerd (455084) on Wednesday January 25 2006, @09:06AM (#14556979)
    (http://www.nerdwatch.com/)

    What a boon this will be to those anime fansub groups who can't find decent translators, or at least translators who aren't overworked.

    • Mod parent up! by Spy der Mann (Score:2) Wednesday January 25 2006, @11:15AM
  • by Ancient_Hacker (751168) on Wednesday January 25 2006, @09:20AM (#14557090)
    Ah yes, super-duper speech recognition is right around the corner!

    I've been hearing this every 6 months for about the last, oh, thiry years.

    Given that the state of the art in something much simpler, like automatic language translation, is pitifully inadequate, how likely is it IBM has conquered speech recognition AND translation?

    Har har har.

  • S-to-T in hospitals (Score:2, Interesting)

    by stardancer (665878) <abstractstar@gLI ... m minus language> on Wednesday January 25 2006, @09:20AM (#14557095)
    I know that one hospital in Norway has been experimenting with/testing speech-to-text software for a while, and reports say it's been very successful! (this supports what was said about speech recognition within a tight context in an earlier comment). I believe the plan is to, at some point, eliminate the need of secretaries transcribing what the doctors dictate, so that ideally the doctors can just speak into a mic and the text automagically appears in the patient's (electronic/digital) journal!

    this of course worries secretaries, since they might eventually lose their job/"career". on the other hand it would improve effeciency *a lot*.

  • by bdwoolman (561635) on Wednesday January 25 2006, @09:47AM (#14557367)
    (http://www.bdwoolman.net/)
    Here we go:

    I can wreck a nice beach. I can recognize speech.

    Well, Dragon Systems eight passed the beach test first try. Knowing the program, however, I did use pretty clear diction.

    I use Dragon Systems and find it absolutely great. There are a few persistent errors. For example, It frequently fails to get "there" and " there" right on the first try. But the fly down menu system enables me to quickly correct the problem on the run. Certainly I pick it up on an edit. If IBM has something better than this -- and it sounds like they do -- then it must be pretty darn good. Of course, you have to insert the punctuation verbally. But that comes with a little practice -- provided that you know what to do in the first place.

    It does take a little bit of investment in time. But not nearly as much as learning to type at seventy words a minute, which I can now do in dictation. I have added very little by way of customized commands etc. The program has done a lot of learning on its own.

    Let's try once again: I can't recognize beach. I can recognize speech. Oops. Okay, it failed that time. Let's try one more time: I can wreck a nice beach. I can recognize speech. Well, the phrases have to be enunciated pretty clearly or the program has trouble.

    Which which blew the blue candle. Failed on the second "which" the b*tch.

    Okay, okay. I'll put the laundry in the dryer. No I am not just screwing around on Slashdot again I'm getting some work done down here. Just a minute. Just a MINUTE.

    One trouble. You do have to put the mike to sleep during family discussions.

  • WoT (Score:2)

    by foo fighter (151863) on Wednesday January 25 2006, @10:10AM (#14557609)
    (http://news.google.com/)
    ...perpetually monitors Arabic television...

    Sounds like the results of a DOD/DARPA/NSA funded research grant. They'd love to be able to translate on the fly, instead of having to train and pay actual humans to manually translate several hours -- or even days and weeks -- after the original transmission.

    Now that IBM has something kinda working and the grant money is running out they are trying to market it to the public. Kinda like Tang for the War on Terror-age.
    • Re:WoT by otis wildflower (Score:1) Wednesday January 25 2006, @10:29AM
  • 'Twas Brillig (Score:1)

    by engineerofsorts (692517) on Wednesday January 25 2006, @10:21AM (#14557720)
    I've always found it most entertaining to check the effects reciting Lewis Carroll's Jabberwocky has on any new/exciting speech reco program.

    On a more serious note, however, my wife was involved in an ill-fated-due-to-ancient-technology project back in grad school in the early 70's which involved:

    1. Speech recognition.
    2. Machine translation into a universal grammar
    3. Translation of the universal grammer into various target languages.
    4. Speech synthesis in the various target languages, using the same vocal qualities as the original speaker.

    Pretty lofty goals cosidering they were probably using computers with discrete components in them.

    Curiously, my wife (a native Japanese speaker) was teamed with the Suomi (Finnish) team because of the similarities in the two language's structures.
    • 1 reply beneath your current threshold.
  • what about... (Score:1)

    by blue_adept (40915) on Wednesday January 25 2006, @11:18AM (#14558398)
    "boy, I sure hope my stupid radio.. doesn't... uh... play 92.3"

    vs,

    "Does your radio suck? boy I sure hope my stupid radio doesn't. Uh, play 92.3"
  • breakdown of the article (Score:1, Interesting)

    by Anonymous Coward on Wednesday January 25 2006, @11:32AM (#14558612)
    The article is really saying two things:

    1. IBM has updated their ViaVoice large vocabulary continuous speech recognition (LVCSR) engine.

    2. IBM has paired ViaVoice with some clever apps to use the ViaVoice output in interesting ways (e.g. "on the fly" recognition, translation).

    Things that are not obvious from the article:

    1. ViaVoice has been around for ages and has always been pretty darn good at LVCSR. Without seeing numbers and knowing exactly how they were measured, it's impossible to know how much of an improvement 4.4 is over previous versions.

    2. Speaker-dependent speech recognition can always achieve much higher accuracy rates than speaker-independent systems like ViaVoice. Dragon NaturallySpeaking is an example of speaker-dependent speech recognition.

    3. Limited grammatical contexts (i.e. language models with low perplexity) always give better recognition than when you don't know what to expect next. For example, when your phone only has to tell "home" and "wife" apart, it's a lot less likely to make a mistake than if it has to figure out which word out of a list of 50,000 you just said. The more context, the better. The most interesting tech in the article seems to be the algorithms "that can determine this context on the fly."

    4. No improvements in translation technology were noted in the article; it sounds like they might as well have fed ViaVoice through BabelFish, made it happen in real time, and slapped a UI on it. The app might be new, but the tech is not.
  • by Khyber (864651) <khyberkitsune@gmail.com> on Wednesday January 25 2006, @11:42AM (#14558733)
    (Last Journal: Saturday November 10, @03:30PM)
    "I had to help my uncle Jack off a horse."

    "I had to help my uncle jack off a horse."

    Will it ever catch that one?
  • I helped apple... (Score:2)

    by xquark (649804) on Wednesday January 25 2006, @11:54AM (#14558893)
    (http://www.partow.net/)
    I helped apple wreck a beach!
  • Another scam (Score:2)

    by Master of Transhuman (597628) on Wednesday January 25 2006, @12:25PM (#14559329)
    This must be the day of the week that scams are announced.

    First we have software that cannot be reverse engineered and guarantees the free speech rights of Americans.

    It comes attached to the Brooklyn Bridge and some Florida swamp land.

    Now we have this crap: "By limiting the domain, the system can make assumptions or inferences about what the user would like to accomplish, he said."

    This is not exactly "superhuman" speech recognition.

    None of this is feasible absent conceptual processing technology. Period.

    I don't know why I don't clean up at the public trough by simply announcing I have "true artificial intelligence" and wait for the checks to roll in before leaving for Brazil.

  • Unlikely (Score:2, Insightful)

    by rcbarnes (875915) on Wednesday January 25 2006, @01:31PM (#14560247)
    Transcription? Not too hard. Translation? I highly doubt it.

    Recent studies of the efficacy of machine translation found that we have made only marginal progress by modern engines from those of the *70s*, (in fact, one of them, SysTrans, is the most used translation engine online) and there were *no* descernable difference between engines of the eighties and current engines. I hope that they're not trying to claim that they suddenly overcame the vast problems of translation wholly independent of the linguistic community. That's just ludicrous.

    I'd love to see the this engine handle a parasitic sentence like this between two largely different languages and catch the nuance in the parens: "Which report did she file (that report) without (her) reading (that same report)?" Sure some engines will hit by chance, but only because of similar structure, but the engine is lucky, not actually parsing the "meaning."
  • by schlick (73861) on Wednesday January 25 2006, @03:00PM (#14561184)
    if we all spoke Lojban [lojban.org].
  • by FCP (66221) <`moc.emani' `ta' `rhd'> on Wednesday January 25 2006, @05:16PM (#14562527)
    (http://slashdot.org/ | Last Journal: Wednesday March 03 2004, @05:45PM)
    I just love the example[1] the IBM marketroids chose for this: "For example, when asking for 'Radio 104.3 FM,' the new IBM-pioneered technology allows drivers to simply say, 'Tune to 104.3,' or 'Set the radio station to 104.3,' or 'Change the radio station to 104.3.'" Of all the amazing applications one could dream up, saving a driver from having to punch a radio preset is what they came up with.

    I rather like "Open the pod bay door, Hal" myself.

    --
    1. http://www-03.ibm.com/press/us/en/pressrelease/191 50.wss [ibm.com]
  • Re:Just what we need... (Score:5, Insightful)

    by pubjames (468013) on Wednesday January 25 2006, @04:55AM (#14555869)
    More opportunities for Arabic speaking people to misinterpret western media.

    I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?
    [ Parent ]
  • Re:Just what we need... (Score:4, Insightful)

    by user9918277462 (834092) on Wednesday January 25 2006, @06:13AM (#14556116)
    (Last Journal: Saturday December 24 2005, @03:18PM)
    There's a very good reason they're testing this tech on Arabic speech primarily. Although they won't say it, I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.
    [ Parent ]
  • Re:Just what we need... (Score:3, Insightful)

    by mwood (25379) on Wednesday January 25 2006, @09:03AM (#14556955)
    Patriotic. What part of "*International* Business Machines" did you not understand? More likely it's to show that they really understand the problem and not just the English-only subset.
    [ Parent ]
  • by yoprst (944706) on Wednesday January 25 2006, @09:34AM (#14557205)
    Tap all arabic/international lines, install zillions speech recognition nodes, make them write everyting to log files and use grep to find whatever you want. Your Arabic may be a hundred times better, but you cannot do anything like that even if you hire a whole Lebanon to help you.
    [ Parent ]
  • 18 replies beneath your current threshold.