IBM Strives For 'Superhuman' Speech Tech 289

Posted by ScuttleMonkey on Wednesday January 25, 2006 @05:34AM from the fansubbing-in-jeopardy dept.

robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."

This discussion has been archived. No new comments can be posted.

IBM Strives For 'Superhuman' Speech Tech

Load All Comments

Search 289 Comments Log In/Create an Account

Comments Filter:

Which ... (Score:4, Interesting)

by spiny ( 87740 ) writes: on Wednesday January 25, 2006 @05:36AM (#14555797) Homepage Journal

Which witch blew the blue candle out ?

Share
twitter facebook
- Re:Which ... (Score:3, Funny)
  
  by jakeweston ( 785112 ) writes:
  
  To wreck a nice beach...
- Re:Which ... (Score:2)
  
  by cs02rm0 ( 654673 ) writes:
  
  The one that understood context.
- Re:Which ... (Score:2)
  
  by lahvak ( 69490 ) writes:
  
  Not really a problem. Machine translation already can handle many words that spell the same but have different meaning (homographs), based on context and position in the sentence. With speech recognition, you just have more of those, you have to throw in homonyms, too.
  
  For simple example, blue in "the blue candle" cannot be a verb.
- Re:Which ... (Score:5, Interesting)
  
  by jcupitt65 ( 68879 ) writes: on Wednesday January 25, 2006 @06:41AM (#14556023)
  
  Or I can wreck a nice beach versus I can recognise speech.
  Sometimes you need rather a large context to disambiguate: is this sentence part of a discussion on shore-front management, or spoken language understanding?
  
  Parent Share
  twitter facebook
  - Re:Which ... (Score:2)
    
    by FirienFirien ( 857374 ) writes:
    
    I agree with the parent, but will take it one step further:
    
    So do we. I can recognise the differences and meaning of "Which witch blew the blue candle" written - but if someone said it to me out of the blue (npi), I'd have to think through it a couple of times to parse it, because if said as intended - with matching sounds, to rely entirely on context inside the sentence to decipher which word is which, then I'd have as much problem as a computer. The semantic rules I was taught as a child are what enables
    - Re:Which ... (Score:3, Insightful)
      
      by mwood ( 25379 ) writes:
      
      Just remember that *you* have a truly enormous and well-filled content-addressable memory, a huge and richly-connected semantic network, and untold numbers of self-adapting heuristics that have been trained all day every day for decades, with more coming into production constantly. It's hard for a machine to match that. Feeding 100,000 distinct pattern matchers in parallel is something most computers just aren't architected to do well. That a machine can do even a passable job of speaker-independant cont
- Fantastic direction (Score:2)
  
  by Simonetta ( 207550 ) writes:
  
  This is a fantastic development. It is exactly the kind of thing that 64-bit processors were made for. It is the 'killer ap', the best since MP3 and CD-rippers. If it actually works, the high-tech equivalent of 'in-shaa Allah'.
  
  We should encourage IBM to allow enough of the technology to 'escape' in order to enable other languages to be translated from speech into English. There should be some kind of open review of the translation involved, also. This can help prevent subtle errors in tr
Coherency? (Score:5, Insightful)

by PrinceAshitaka ( 562972 ) * writes: on Wednesday January 25, 2006 @05:38AM (#14555810) Homepage

From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"

Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.

I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.

Share
twitter facebook
- Re:Coherency? (Score:4, Interesting)
  
  by Yahweh Doesn't Exist ( 906833 ) writes: on Wednesday January 25, 2006 @05:48AM (#14555849)
  
  yes, there will always be delay for the reason you state. but that's true even with human translators, yet no-one claims real-time meetings between people via translators is a waste of time.
  
  since even "live" boradcasts are usually delayed several minutes for technical and legal reasons anyway, if this technology can get to the state where you're just one or two sentences behind real-life it will be effectively real-time anyway for almost all practical purposes.
  
  Parent Share
  twitter facebook
- Re:Coherency? (Score:2)
  
  by grimJester ( 890090 ) writes:
  
  In what cases is a four minute delay noticable if the picture and sound are delayed four minutes too? I'd love this for watching movies that are currently completely incomprehensible to me.
  
  For the 80% part, it's good enough to get the gist of what is said. It won't compete with professional human translators, but it will make translation easily available for those who don't have access to a translator.
  - Re:Coherency? (Score:3, Funny)
    
    by sumdumass ( 711423 ) writes:
    
    I'm wondering if this was used durring the lead up on Iraq? "i'm unclear if there are bombs here" and end up getting translated into "there are nuclear bombs here".
- Re:Coherency? (Score:2, Informative)
  
  by wizrd_nml ( 661928 ) writes:
  
  For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.
  Not necessarily. An on-the-fly translator could translate words as it hears them filling in the translated words in the correct location in the sentence. In other words, the sentence doesn't have to be completed in order. It can dynamically expand to fit in new words.
  If you listen to human translators doing on-the-fly translation you'll see this is h
- Re:Coherency? (Score:3, Interesting)
  
  by dancallaghan ( 890674 ) writes:
  
  but I personally don' think there could ever be real time translation for the following reason. [German]
  You are going to have that problem whether it's a machine doing the translating or a human. As I understand it, interpreters of German get around this by some quick-thinking restructuring of the translated sentence, or they simply lag a half-sentence or so behind.
  The real problem for machine translation is, and always has been, determining the sense of a word from context (indeed I recall a recent S
- And German is an easy one (Score:5, Informative)
  
  by Ogemaniac ( 841129 ) writes: on Wednesday January 25, 2006 @06:44AM (#14556030)
  
  It is as closer to English as any other language. In general, European languages have the same basics as English (such as "the") and are fairly easy to learn and translate. Right now I live in Japan, where the language and its underlying way of thinking basically run in the reverse direction of English. To translate, you are essentially running the whole thing backwards. Worse yet, the fundamental parts of the language are quite different. For example, Japanese does not have articles or prepositions, though it has post-positions that roughly correspond. However, there are fewer of them, so they have "lots of meanings" when translated into English. Translation can be a "#$#, even for a human who understands both languages very well (which is why anime comes off so corny sometimes). There are countless times where there is just no simple way to express a thought in one language that is trivial in the other.
  
  Parent Share
  twitter facebook
  - No, German also changes word order (Score:2)
    
    by hughk ( 248126 ) writes:
    
    Although from the same linguistic family (but English also owes a lot to French and Latin) there are some important grammatical differences. The issue with interpreting German is that the verb (and any negation) may come at the end of the sentence. German can have some very long sentences.
    For a human, the issue is that you can't interpret based on the phrase, so a human interpreter has quite a lot to do. The interesting thing is that experienced interpreters do this unconsciously.
    I have been an admirin
    - I agree, it does (Score:2)
      
      by Ogemaniac ( 841129 ) writes:
      
      But not to the extent of Japanese. I lived in Austria for a summer, and after just three months, with no prior study, I started "getting" it sometimes. On the other hand, with 2.5 years of university study and ten months of living in Japan, I often hard time following the logic of a long sentence - even when written and when I know all of the words.
      
      Generally, it is estimated that it takes an English speaker about twice as long to learn a languages from the Asian or Arabian groups as it does a European
    - Re:No, German also changes word order (Score:2)
      
      by mwood ( 25379 ) writes:
      
      For some entertaining examples, see Mark Twain's "The Awful German Language".
  - Re:And German is an easy one (Score:2)
    
    by ookaze ( 227977 ) writes:
    
    I would not say that german is easy.
    Anyway, in japanese, you forgot the fact that the verb is not even always present in the sentence (just guessed depending on the context), and that sometimes, with the exact same sentence, subject and object are switched depending on the context too.
    This require some training to understand, I still did not mastered it well, and seeing lots of fansubs shows me that I'm not the only one that has not mastered this (and I'm not the worse).
    I guess a machine would have a really
  - - Re:Sorry to disagree. (Score:2)
      
      by dunkelfalke ( 91624 ) writes:
      
      slavic languages are also indoeuropean.
- Re:Coherency? (Score:2)
  
  by vertinox ( 846076 ) writes:
  
  I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason...
  
  Still, the only thing faster or just as fast is a human translator for real time translation. Even then it is more or less based on the skill of the person doing the translating.
first? (Score:5, Funny)

by Anonymous Coward writes: on Wednesday January 25, 2006 @05:39AM (#14555811)

however the researchers stated "We still can't figure out what Bob Dylan is saying"

Share
twitter facebook
- Re:first? (Score:2)
  
  by Orgazmus ( 761208 ) writes:
  
  Bobs speck is totly legbl
Nuances (Score:4, Funny)

by AnonymousYellowBelly ( 913452 ) writes: on Wednesday January 25, 2006 @05:43AM (#14555834)

GB on TV: "We have prevailed"
Subtitle: "All your base are belongs to us"

Share
twitter facebook
- Re:Nuances (Score:2)
  
  by argStyopa ( 232550 ) writes:
  
  Not sure why this is rated as Funny (+5).
  Sounds like a perfect translation success to me.
NSA Babelfish (Score:2, Funny)

by Elixon ( 832904 ) writes:

I cannot wait when I buty the first eBabelfish gadget that I will put in my ear so I can understand spoken language of my russian colegues... ;-) :-) I hope that someobody will not consider it as "important technology for the national security" and will not restrict it by any mean...

(I'm sure that this eBabelfish is already installed - not in my ear - but on the telecommunication centers...)
- Re:NSA Babelfish (Score:2)
  
  by sumdumass ( 711423 ) writes:
  
  You don't want to understand what they are saying. I have heard them and just trust me on this.
  
  BTW nice 'buty'
Foreign languages are complex... (Score:5, Insightful)

by pubjames ( 468013 ) writes: on Wednesday January 25, 2006 @05:52AM (#14555857)

I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.

It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.

This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.

I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.

Share
twitter facebook
- Re:Foreign languages are complex... (Score:3, Insightful)
  
  by Viol8 ( 599362 ) writes:
  
  "It's not until you learn another foreign language that you realise how complex languages are, and how subtle."
  
  And how wierd sometimes. English for example loves to use the word "up" in all
  sorts of unsuitable places:
  
  give up
  shut up
  fed up
  wash up
  fuck up
  laid up
  muck up
  turn up
  free up
  look up
  make up
  put up
  screw up
  hang up
  wrap up
  hold up
  grow up
  
  Wtf?
  
  And home come we say "didn't he.." but in longhand its "did he not...". Shouldn't
  it be "did not he"? Why does the "not" shift to the other side of the pronoun?
  But then all la
  - Re:Foreign languages are complex... (Score:5, Funny)
    
    by MPHellwig ( 847067 ) writes: <mhellwig@xs4all.nl> on Wednesday January 25, 2006 @06:09AM (#14555921) Homepage
    
    And of course: "Up yours!" ;-)
    
    Parent Share
    twitter facebook
    - Re:Foreign languages are complex... (Score:2)
      
      by bogado ( 25959 ) writes:
      
      And of course: "Up yours!" ;-)
      
      Well, in this particular case dosen't "up" means what the word supose to mean?
  - Re:Foreign languages are complex... (Score:5, Funny)
    
    by Splab ( 574204 ) writes: on Wednesday January 25, 2006 @07:30AM (#14556166)
    
    From boondock saints:
    Rocco: Fucking... What the fuck. Who the fuck fucked this fucking... How did you two fucking fucks...
    [shouts]
    Rocco: fuck!
    Connor: Well, that certainly illustrates the diversity of the word.
    
    Think that just about covers it...
    
    Parent Share
    twitter facebook
  - Re:Foreign languages are complex... (Score:2)
    
    by mwood ( 25379 ) writes:
    
    "Did he not" and "did not he" both work.
    
    But when did you last see anyone write either of these forms?
- Re:Foreign languages are complex... (Score:3, Interesting)
  
  by Mushdot ( 943219 ) writes:
  
  I have a friend works in Japan and he tells me the same. He often goes to watch English films that are subtitled in Japanese and tells me that they completely miss-translate most of the jokes and miss subtle nuances of speech. One example he gave was a scene from 'The Full Monty' (im doing this from distant memory so it might not be quite right - in fact, a bad translation :-)
  
  One of the characters is shouting up to someone in their bedroom window. They don't respond to the shouting and the character says "H
  - Re:Foreign languages are complex... (Score:2)
    
    by pubjames ( 468013 ) writes:
    
    Another example of this I saw in a french film recently. A character was overhearing a conversation about a ship being under quarentine. He said "Is it the captains birthday?" Makes no sense at all in English but in French it is a play on words and (feeble) joke. Impossible to translate.
    - Re:Foreign languages are complex... (Score:2)
      
      by Red Alastor ( 742410 ) writes:
      
      Do you remember the joke ? I speak french and I can't figure out what it originaly was.
      - Re:Foreign languages are complex... (Score:2)
        
        by pubjames ( 468013 ) writes:
        
        The (stupid) character assumed that the captain was having fortieth birthday party - forty being "quarante" in French, so a "quarantaine" sounds a bit like a word for a fortieth birthday party. I said it was feeble. But it is an example of a joke that's impossible to translate.
        
        Re:Foreign languages are complex... (Score:2)
        
        by Red Alastor ( 742410 ) writes:
        
        That's far stretched since the right word would be "quarantième" et non "quarantaine". Beside, it's hard to make a sentence that doesn't make it obvious that it's the ship and not the captain whose the subject.
        
        An exemple of English / French I saw in a movie :
        
        - Yeah but...
        - What about my butt ?!
        
        I don't remember at all how they translated that :)
  - Re:Foreign languages are complex... (Score:2)
    
    by pubjames ( 468013 ) writes:
    
    I suspect films are probably translated in one pass and there is no time to understand the context of each sentence spoken so it's left to literal translatation only
    
    I think it is more to do with the fact that they have to write the subtitles so that they can be read at the speed of the speech. And so they cannot go into subtleties. In fact often when there is fast dialogue they will miss whole phrases out.
  - Japanese and English are quite different (Score:3, Insightful)
    
    by Ogemaniac ( 841129 ) writes:
    
    and it is usually extremely difficult to translate jokes. The senses of humor are quite different as well. I think this is part of the charm of anime, actually - we are laughing at things Japanese aren't always intended to find funny, while missing half of the jokes that are supposed to be there.
- Re:Foreign languages are complex... (Score:2, Insightful)
  
  by virtualsid ( 250885 ) writes:
  
  I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.
  
  I'm not quite sure what you mean here not bother because of this technology?
  
  I can't see anyone not wanting to bother learning a language because of this technology. Not unless it was a babelfish/universal translator type technology - i.e. basically invisible. In which case, what's the issue? ;-)
  
  What are you going to do:
  a) Walk around with a little device which translates with 60-80% accura
  - Re:Foreign languages are complex... (Score:2)
    
    by pubjames ( 468013 ) writes:
    
    I'm not quite sure what you mean here not bother because of this technology?
    
    Perhaps you a not like most people... I often hear English only speaking people say there is no point in learning another language because everyone learns English these days. This just gives them another excuse.
- Re:Foreign languages are complex... (Score:3, Insightful)
  
  by anum ( 799950 ) writes:
  
  Learning a foreign language is a net good and the only way to really understand another culture is to experience it. That said, there are a large number of languages and an even larger number of cultures. Do you intend to learn/experience them all?
  
  Can you see no good in a rough translation for some purposes?
  
  Calculators have largely eliminated the need (an in some cases the ability) for people to do basic math. Therefore we should eliminate calculators before these people start believing that they comple
  - Re:Foreign languages are complex... (Score:2)
    
    by pubjames ( 468013 ) writes:
    
    Can you see no good in a rough translation for some purposes?
    
    Of course.
    
    But from the description I think this is being developed for military or intelligence work. In those fields, mistranslations can cause death. And unfortunately I think the current administration is unsophisticated enough to think that machine translation is better than (more expensive) human translation.
    - Re:Foreign languages are complex... (Score:3, Interesting)
      
      by anum ( 799950 ) writes:
      
      Ya, I got ya'.
      
      I almost added "I just hope GWB doesn't decide to fire all his intell linguists based on this post" but it seemed kind of like bashing the Prez and i would never do that...
      
      Cheers
  - Re:Foreign languages are complex... (Score:2)
    
    by mwood ( 25379 ) writes:
    
    Time to check out that Asimov story about a society where mechanical computation was so pervasive that people no longer learned arithmetic. "The Feeling of Power"
- Re:Foreign languages are complex... (Score:2)
  
  by Archibald Buttle ( 536586 ) writes:
  
  There's a really simple reason why film subtitles omit jokes and get things wrong. It is almost never possible to directly translate from one language to another, so subtitles inevitably have to be an aproximation of the original speech in order to help match the pacing of the original film. They also have to not be too wordy, since the viewer needs to watch the film, as well as read the subtitles.
  
  Language is about more than just words, it's about phrases too. A speakers choice of words and phrases gives
  - Re:Foreign languages are complex... (Score:2)
    
    by mwood ( 25379 ) writes:
    
    Read the English translation of Lem's _Cyberiad_ before you tell us how impossible it is to translate humor. I'll buy the time-to-read argument, though.
  - - Re:Foreign languages are complex... (Score:2)
      
      by makomk ( 752139 ) writes:
      
      Then there's news bulletins and documentaries where an English translation is loudly superimposed over the top of native speaker. You can still hear the tones of the person they're interviewing but it's drowned out due to the translation.
      
      OT, but that reminds me - I saw a docudrama (in English) a while back where you could tell the documentary bit from the drama bit because the actors spoke in a foreign language and were subtitled, while the actual interviewees who spoke in a foreign language had a spoke
- Re:Foreign languages are complex... (Score:2)
  
  by Julian Morrison ( 5575 ) writes:
  
  I don't necessarily agree. Like most tech it's a tool - the task is up to the user. I find that fansubbed anime helps my Japanese. I'm picking out words and grammar from the flow of speech and simultaneously matching them against the translation. Often I can actually pick out where the translation was fudged or the subtleties were left out. Without the feedback from the subscripts, I wouldn't have that yet.
  
  On the other hand, there are cases where I just want to read something quiclkly, and putting the page
Ghee... (Score:4, Insightful)

by Anonymous Coward writes: on Wednesday January 25, 2006 @05:54AM (#14555864)

Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?

Share
twitter facebook
- Re:Ghee... (Score:2)
  
  by forgotten_my_nick ( 802929 ) writes:
  
  > cough cough echelon cough
  
  Funny you should mention that. I recall a US government department set up just after 9/11 which one of the things it would be working on was a handheld device that could translate from English to Arabic on the fly.
  
  Only reason I recall this is because the logo of said department was the all seeing eye shining some kind of beam over the rest of the world. Prehaps someone with a better TFH then me has a link. :)
  - Re:Ghee... (Score:2)
    
    by amliebsch ( 724858 ) writes:
    
    You're probably thinking of the (now-defunct?) Information Awareness Office [thememoryhole.org].
If they REALLY want to test it properly... (Score:5, Funny)

by Viol8 ( 599362 ) writes: on Wednesday January 25, 2006 @05:56AM (#14555872) Homepage

...they should send it to Glasgow on a saturday night just after the pubs
have closed.

"Ye loooiii ahhh me jimmeh??! *belch* C'mere ya wee electrahnich bastid, I'll
shoo ye!"

Share
twitter facebook
It isn't worth it (Score:5, Funny)

by YearOfTheDragon ( 527417 ) writes: on Wednesday January 25, 2006 @06:00AM (#14555893) Homepage

May be IBM is going to make speech recognition true, but Bill Gates said that this was posible a long time ago [mpt.net.nz]. Simply genius.

Share
twitter facebook
On-The-Fly (Score:5, Informative)

by Trurl's Machine ( 651488 ) writes: on Wednesday January 25, 2006 @06:02AM (#14555901) Journal

They really do it on the fly? You mean, [on the surface of] [a particular] [insect of a Musca domestica species]?

I have read a lot of auto-translated documents and it is always a good laughter in terms of "crapslation cabaret". So far, there is no technology that could auto-translate a text document succesfully. The "80% success" is a myth - they just count how many words were found in the vocabulary, not how many of them were put into a good context. A "fly" translated as an insect would be accounted as a success!

Even if you are not a bot but a human being with some knowledge of the other language and culture, it's very easy to involuntary offend someone or just to make a ridiculous faux-pas. Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-translate this...

Share
twitter facebook
- Re:On-The-Fly (Score:2)
  
  by Aceticon ( 140883 ) writes:
  
  Portuguese is both spoken in Portugal and Brasil.
  
  Still, for example the slang word use in Portugal for "traffic jam" (bicha) is the slang word in Brasil for "gay".
  
  Talking about the congestion on the streets of Lisbon takes a whole new meaning in Brasil.
- Re:On-The-Fly (Score:2)
  
  by blackest_k ( 761565 ) writes:
  
  machine translation is ropey admittedly but one of the best for polish english translation is
  English Translator3 www.techland.pl
  Earlier versions didn't know the difference between a shower of rain and taking a shower for instance. although you still need to take care with Polish and polish the capital P makes a difference.
  it does provide alternative translations so you can do a basic translation and apply a more appropriate translation.
  It's getting old now so perhaps there has been an update.
- Re:On-The-Fly (Score:2)
  
  by Cro Magnon ( 467622 ) writes:
  
  Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-transla
- - Re:On-The-Fly (Score:3, Insightful)
    
    by Red Alastor ( 742410 ) writes:
    
    However, add in "domain knowledge" and you're in some interesting territory. I think this is essentially what Google did - they fed in oodles of texts in the various languages so that the system could statistically match phrases. At a simple level, you could have a lookup table of common colloquialisms (eg. 'he's kicked the bucket'(English/UK) == 'he broke his pipe' (French/FR)).
    The problem is that why French/FR people will understand the expression, others like French/CA won't. And even if they did spec
IBM and Google cooperation to come? (Score:3, Interesting)

by Mostly a lurker ( 634878 ) writes: on Wednesday January 25, 2006 @06:13AM (#14555934)

IBM has been one of the pioneers in speech recognition for a long time. However, indications are that Google (in the lab) [slashdot.org] has been making tremendous progress in translation. While the two companies are bound to be fierce competitors, it would seem they would both have much to gain from cooperation in the area of language recognition and translation.

Share
twitter facebook
This won't make speech recognition mainstream (Score:4, Interesting)

by thbb ( 200684 ) writes: on Wednesday January 25, 2006 @06:16AM (#14555949) Homepage

As it has been the case for the past thirty years, the description of the prowesses of the system are still written in the conditional form: "...IBM technology can be used to control computers and devices..." rather than the active form: "is being used"...

Ben Shneiderman is the person who, in my opinion, articulates the best the limits of speech recognition [umd.edu].

One of my favorite phrases to explain this issue is: "You don't want to speak to a computer, because you can't speak and think at the same time". More precisely, speech utterance makes use of some modules in our brain which are required for planification too. Hence, you can't plan as well what to do next when you speak, which is a big hurdle in the type of intellectual activities one carries with a computer.

Share
twitter facebook
- Re:This won't make speech recognition mainstream (Score:2)
  
  by aug24 ( 38229 ) writes:
  
  'Planification'?
  Hmmm, this computer's going to have a hard time understanding you.
  Justin.
  - Re:This won't make speech recognition mainstream (Score:2)
    
    by mwood ( 25379 ) writes:
    
    Planification: the process (ation) of making (fic) plans. Easy. I would have said "planning" or "generating plans" though. "Planification" is probably a term of art. Eventually the recognizer would be set up to know this, and the metadata indicating domain-specificity could even help it work out the rest of the sentence.
  - - Re:This won't make speech recognition mainstream (Score:2)
      
      by aug24 ( 38229 ) writes:
      
      Absolutely - should've put a smiley on the end to make my meaning clear.
      
      J.
Awful default TTS (Score:4, Insightful)

by Council ( 514577 ) writes: <rmunroe.gmail@com> on Wednesday January 25, 2006 @06:19AM (#14555957) Homepage

Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.

What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.

This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.

I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.

Share
twitter facebook
- Re:Awful default TTS (Score:2, Informative)
  
  by wfWebber ( 715881 ) writes:
  
  Then again, if they supplied a version that produced awesome quality voices, they'd be accused of trying to kill their TTS competition.
  
  That said, in Microsoft Windows Vista (ETA 2019), the default TTS engine will be replaced by a new one sporting Anna [wikipedia.org]. Have heard her in the preview and I have to say, it's one hell of an improvement.
- Re:Awful default TTS (Score:2)
  
  by Viol8 ( 599362 ) writes:
  
  Probably BECAUSE speech is a niche market , MS don't want to spend the
  money on making it any better. So long as it sort-of works then the marketing
  droids have something apparently bleeding edge to waffle on about in the sales
  pitch knowing full well very few people will use it and discover how crap it
  is, and the ones who do are such a small percentage anyway that they won't care.
- - Re:Awful default TTS (Score:2)
    
    by mrjb ( 547783 ) writes:
    
    Amiga? In 1982, the TI-99/4a with Terminal Emulator II and speech synthesizer already did what XP's tin man does nowadays. Pity that machine was crippleware, you had to buy all kinds of add-ons for it to get some power from it.
What about SubHuman Speech? (Score:2)

by shotgunefx ( 239460 ) writes:

Serious, you hear how some people "talk" these days?
American or English? (Score:3, Interesting)

by squoozer ( 730327 ) writes: on Wednesday January 25, 2006 @06:30AM (#14555989)

I realize that Anericans and British (English at least ;o)) speak essentially the same language but I have yet to find any speech recognition software that can get more than roughly 85% of what I say correct. I have a fairly soft neutral english accent with pretty good enunciation so I would have expectd to be getting a recognition rate in the high 90%s. I'm wondering if, as most of this software is developed in the US, it is tuned specifically to pick up on english with a US accent? I realize that you train the software for your voice but AIUI all you are doing is tuning a basic speech model. Has anyone else had this problem or is it just me?

Share
twitter facebook
- Re:American or English? (Score:3, Funny)
  
  by Vengeance ( 46019 ) writes:
  
  I'm sorry, what?!?!?
  
  I cannot understand a word you're saying. What's with that accent?
- Re:American or English? (Score:2)
  
  by IamTheRealMike ( 537420 ) writes:
  
  Existing speech recognition engines rely on statistical approaches just like this "miracle" product does to disambiguate sounds and words, and yes about 80% accuracy sounds right. Of course this is too low when competing against a keyboard, even though speech recognition could be a lot faster by the time you corrected all the mistakes it works out slower - hence the reason it's only used in limited applications.
  I have virtually no accent at all, except for very mild British overtones, yet speech recogniti
  - Re:American or English? (Score:2)
    
    by djmurdoch ( 306849 ) writes:
    
    I have virtually no accent at all, except for very mild British overtones...
    
    That claim makes no sense whatsoever. You have a regional accent, it just happens to come close to the one you hear around you most commonly. I'm guessing it's a midwest accent, aka "General American", aka the US TV network announcer accent.
- - Re:Tip (Score:2)
    
    by squoozer ( 730327 ) writes:
    
    I gave up on speech recognition as everything but a toy a while ago but your tip could lead to some interesting mistakes. Take for instance the sentence fragment "Runing to the door". If it is pronounced as you suggest it could easliy be misunderstood by the machine to be "run in to the door" which could have nasty consequences.
Oh oh oh. (Score:3, Funny)

by Anonymous Coward writes: on Wednesday January 25, 2006 @06:33AM (#14556003)

I think it was about 1996 or maybe 1997 when I attended an IBM demonstration (for retailers) for its speech recognition software. Anyway, the lady who was narrating the text and. talking. like. a. robot. to. do. it. was half-way through when, for no apparent reason, the word uterus appeared in the text.

So I'm sitting here thinking of how funny it was to the juvenile me back then, and how unfunny it seems right now. Oh well.

Share
twitter facebook
Not _that_ amazing (Score:2, Interesting)

by johndoe42 ( 179131 ) writes:

It's been well-known among language researchers that both speech recognition and parsing/comprehension are much easier when applied to a small problem domain. SRI in Palo Alto and CSLI at Stanford, for example, have a number of very impressive speech recognition packages that understand, for example, medicine-related sentences. The dashboard controls just sound like a logical progression of this to faster computers and an even smaller problem domain. They're cool nonetheless.

The translation, on the other
Buyer beware (Score:5, Insightful)

by 99luftballon ( 838486 ) writes: on Wednesday January 25, 2006 @07:04AM (#14556085)

Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?

Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.

So don't rush to buy. Let the labs check it out first.

Share
twitter facebook
I'll just be happy if (Score:2)

by el_womble ( 779715 ) writes:

it does what the current generation of speech recognition claims to do. I have yet to find any dictation software that is even remotely accurate, and the voice command software has been pap, at least for me. There is something about my accent that really upsets speech recogntion software.

Nintendogs: I've stopped trying to train my dog, its never going to happen.
Apple Speech: Only works if I use a terrible californian accent. Not worth the embarresment.
Nokia: Even with just one voice command, my girlfriends
- Re:I'll just be happy if (Score:2)
  
  by Cro Magnon ( 467622 ) writes:
  
  I once read about someone dictating to his voice-recog software when 2 of his cow-orkers stopped by. He said "Hi, Nick and Ben". The software printed "Hi, naked men".
funny this subject should come up... (Score:2, Interesting)

by dafragsta ( 577711 ) writes:

I've actually never used any speech recognition software before today. That said, today just happens to be the day. That said, I tried out Dragon NaturallySpeaking for the first time, and it is a complete coincidence that this topic should come up. I'm actually dictating this post with Dragon, as we speak. ha ha

the training process definitely has its ups and downs. The more you work with it however, the more it becomes attenuated to your own speech patterns and moreover, the quirky words we use every day. I
Real-time eavesdropping (Score:2, Interesting)

by 0xC2 ( 896799 ) writes:

Although most of the discussion so far has focused on foreign language translation, this technology is about *real-time-audio-to-text* conversion. The feds will be able to monitor, analyze, and record our conversations in real time:

Monitor all conversation.
Apply real-time text filters.
Assign live agents to priority eavesdropping.
Profit!

If you could apply a filter to listen in to any call what would it be?
Finally! (Score:2)

by digitaldc ( 879047 ) writes:

We can figure out just what the hell Ozzy Osbourne is saying!
Translating Arab TV (Score:3, Informative)

by Perl-Pusher ( 555592 ) writes: on Wednesday January 25, 2006 @08:56AM (#14556481)

I imagine it is easier to translate repetitive phrases such as "The zionist oppresssor shall be eliminated", "The great Satan America will be destroyed" and "Our martyrs have struck fear in the hearts of the infidels ".
I was in Kuwait and watched arab TV with english subtitles, it was enlightening to say the least. One long tribute to racism paid for by the Amir of Quatar. Only on arab TV will you see such trash as "the jews are descended from pigs".

Share
twitter facebook
Big deal, I can do that on my Apple ][ (Score:2)

by Fear the Clam ( 230933 ) writes:

One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles.

10 PRINT "DEATH TO AMERICA";
20 GOTO 10

RUN
Speech Synthesis. (Score:2)

by crhylove ( 205956 ) writes:

So I think there should be a program to resynthesize the "learned" words into the most exact average of any given way to say it. I'd love to hear the results, that would be fascinating.
Excellent Product, Confused Reviewers (Score:2, Informative)

by MarsGov ( 300325 ) writes:

ViaVoice Embedded, the product that they're releasing, works on limited-domain problems: for example, tasks related to control of your car's peripherals. When the vocabulary and grammars are constrained it's possible to acheive very decent accuracy.

Dictation, however, is a completely different problem. There are far fewer constraints on what can be said, and the system makes errors as it picks through the possible choices. As a result, most dictation software requires training: the system will use your voic
Let's see it translate poems (Score:3, Interesting)

by roman_mir ( 125474 ) writes: on Wednesday January 25, 2006 @09:57AM (#14556900) Homepage Journal

When and if it can translate poems [slashdot.org] from language to language, while keeping the style, the nuances, the rythm, the cultural references, the general idea and the details, then we will know - it is done. Until then, don't hold your breath.

Share
twitter facebook
- Re:Let's see it translate poems (Score:4, Interesting)
  
  by hunterx11 ( 778171 ) writes: <hunterx11@g m a i l . c om> on Wednesday January 25, 2006 @10:23AM (#14557112) Homepage Journal
  
  I'd be happy enough if humans could do this.
  
  Parent Share
  twitter facebook
Anime fansubs! (Score:2)

by CptNerd ( 455084 ) writes:

What a boon this will be to those anime fansub groups who can't find decent translators, or at least translators who aren't overworked.
Thanks for the laugh! (Score:2)

by Ancient_Hacker ( 751168 ) writes:

Ah yes, super-duper speech recognition is right around the corner!
I've been hearing this every 6 months for about the last, oh, thiry years.
Given that the state of the art in something much simpler, like automatic language translation, is pitifully inadequate, how likely is it IBM has conquered speech recognition AND translation?
Har har har.
S-to-T in hospitals (Score:2, Interesting)

by stardancer ( 665878 ) writes:

I know that one hospital in Norway has been experimenting with/testing speech-to-text software for a while, and reports say it's been very successful! (this supports what was said about speech recognition within a tight context in an earlier comment). I believe the plan is to, at some point, eliminate the need of secretaries transcribing what the doctors dictate, so that ideally the doctors can just speak into a mic and the text automagically appears in the patient's (electronic/digital) journal!
this of
Live experiment with Dragon 8 (Score:4, Funny)

by bdwoolman ( 561635 ) writes: on Wednesday January 25, 2006 @10:47AM (#14557367) Homepage

Here we go:
I can wreck a nice beach. I can recognize speech.
Well, Dragon Systems eight passed the beach test first try. Knowing the program, however, I did use pretty clear diction.
I use Dragon Systems and find it absolutely great. There are a few persistent errors. For example, It frequently fails to get "there" and " there" right on the first try. But the fly down menu system enables me to quickly correct the problem on the run. Certainly I pick it up on an edit. If IBM has something better than this -- and it sounds like they do -- then it must be pretty darn good. Of course, you have to insert the punctuation verbally. But that comes with a little practice -- provided that you know what to do in the first place.
It does take a little bit of investment in time. But not nearly as much as learning to type at seventy words a minute, which I can now do in dictation. I have added very little by way of customized commands etc. The program has done a lot of learning on its own.
Let's try once again: I can't recognize beach. I can recognize speech. Oops. Okay, it failed that time. Let's try one more time: I can wreck a nice beach. I can recognize speech. Well, the phrases have to be enunciated pretty clearly or the program has trouble.
Which which blew the blue candle. Failed on the second "which" the b*tch.
Okay, okay. I'll put the laundry in the dryer. No I am not just screwing around on Slashdot again I'm getting some work done down here. Just a minute. Just a MINUTE.
One trouble. You do have to put the mike to sleep during family discussions.

Share
twitter facebook
- Re:Just what we need... (Score:5, Insightful)
  
  by pubjames ( 468013 ) writes: on Wednesday January 25, 2006 @05:55AM (#14555869)
  
  More opportunities for Arabic speaking people to misinterpret western media.
  
  I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?
  
  Parent Share
  twitter facebook
  - Re:Just what we need... (Score:2)
    
    by MichaelSmith ( 789609 ) writes:
    
    Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?
    Yeah, that too.
  - Re:Just what we need... (Score:2, Funny)
    
    by meringuoid ( 568297 ) writes:
    
    Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?
    Pah. English-speaking people never misinterpret Arabic media. al-Jazeera is a terrorist front organisation and ought to be bombed, and that's all there is to it!
- Re:Just what we need... (Score:4, Insightful)
  
  by user9918277462 ( 834092 ) writes: on Wednesday January 25, 2006 @07:13AM (#14556116) Journal
  
  There's a very good reason they're testing this tech on Arabic speech primarily. Although they won't say it, I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.
  
  Parent Share
  twitter facebook
  - Re:Just what we need... (Score:2)
    
    by pev ( 2186 ) writes:
    
    I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.
    
    And more importantly (for them) no pesky staff translators with a conscience leaking what they transcribed [bbc.co.uk] or the greater good.
    
    ~Pev
- - Re:Just what we need... (Score:3, Insightful)
    
    by mwood ( 25379 ) writes:
    
    Patriotic. What part of "*International* Business Machines" did you not understand? More likely it's to show that they really understand the problem and not just the English-only subset.
- Re:Opensource? (Score:3, Insightful)
  
  by omeg ( 907329 ) writes:
  
  Of course it won't be open source. They achieved what they dub a "breakthrough in speech recognition". They plan on making a lot of money with this.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Which ... (Score:4, Interesting)

Re:Which ... (Score:3, Funny)

Re:Which ... (Score:2)

Re:Which ... (Score:2)

Re:Which ... (Score:5, Interesting)

Re:Which ... (Score:2)

Re:Which ... (Score:3, Insightful)

Fantastic direction (Score:2)

Coherency? (Score:5, Insightful)

Re:Coherency? (Score:4, Interesting)

Re:Coherency? (Score:2)

Re:Coherency? (Score:3, Funny)

Re:Coherency? (Score:2, Informative)

Re:Coherency? (Score:3, Interesting)

And German is an easy one (Score:5, Informative)

No, German also changes word order (Score:2)

I agree, it does (Score:2)

Re:No, German also changes word order (Score:2)

Re:And German is an easy one (Score:2)

Re:Sorry to disagree. (Score:2)

Re:Coherency? (Score:2)

first? (Score:5, Funny)

Re:first? (Score:2)

Nuances (Score:4, Funny)

Re:Nuances (Score:2)

NSA Babelfish (Score:2, Funny)

Re:NSA Babelfish (Score:2)

Foreign languages are complex... (Score:5, Insightful)

Re:Foreign languages are complex... (Score:3, Insightful)

Re:Foreign languages are complex... (Score:5, Funny)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:5, Funny)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:3, Interesting)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:2)

Japanese and English are quite different (Score:3, Insightful)

Re:Foreign languages are complex... (Score:2, Insightful)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:3, Insightful)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:3, Interesting)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:2)

Re:Foreign languages are complex... (Score:2)

Ghee... (Score:4, Insightful)

Re:Ghee... (Score:2)

Re:Ghee... (Score:2)

If they REALLY want to test it properly... (Score:5, Funny)

It isn't worth it (Score:5, Funny)

On-The-Fly (Score:5, Informative)

Re:On-The-Fly (Score:2)

Re:On-The-Fly (Score:2)

Re:On-The-Fly (Score:2)

Re:On-The-Fly (Score:3, Insightful)

IBM and Google cooperation to come? (Score:3, Interesting)

This won't make speech recognition mainstream (Score:4, Interesting)

Re:This won't make speech recognition mainstream (Score:2)

Re:This won't make speech recognition mainstream (Score:2)

Re:This won't make speech recognition mainstream (Score:2)

Awful default TTS (Score:4, Insightful)

Re:Awful default TTS (Score:2, Informative)

Re:Awful default TTS (Score:2)

Re:Awful default TTS (Score:2)

What about SubHuman Speech? (Score:2)

American or English? (Score:3, Interesting)

Re:American or English? (Score:3, Funny)

Re:American or English? (Score:2)

Re:American or English? (Score:2)

Re:Tip (Score:2)

Oh oh oh. (Score:3, Funny)

Not _that_ amazing (Score:2, Interesting)

Buyer beware (Score:5, Insightful)

I'll just be happy if (Score:2)

Re:I'll just be happy if (Score:2)