Follow Slashdot blog updates by subscribing to our blog RSS feed

Microsoft Speech Recognition Now As Accurate As Professional Transcribers (techcrunch.com) 176

Posted by EditorDavid on Monday August 21, 2017 @07:30AM from the speech-to-texts dept.

An anonymous reader quotes TechCrunch: Microsoft announced today that its conversational speech recognition system has reached a 5.1% error rate, its lowest so far. This surpasses the 5.9% error rate reached last year by a group of researchers from Microsoft Artificial Intelligence and Research and puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times. Both studies transcribed recordings from the Switchboard corpus, a collection of about 2,400 telephone conversations that have been used by researchers to test speech recognition systems since the early 1990s. The new study was performed by a group of researchers at Microsoft AI and Research with the goal of achieving the same level of accuracy as a group of human transcribers who were able to listen to what they were transcribing several times, access its conversational context and work with other transcribers.

This discussion has been archived. No new comments can be posted.

Microsoft Speech Recognition Now As Accurate As Professional Transcribers

Load All Comments

Search 176 Comments Log In/Create an Account

Comments Filter:

Laughable Hype (Score:5, Interesting)

by bwanagary ( 522899 ) writes: on Monday August 21, 2017 @07:36AM (#55055851)

On a daily basis in my work environment Microsoft technology is used to a) record voicemail and b) generate text from the speech. Never, ever, have I received any converted voicemail that wasn't completely unintelligible gibberish. Seriously. This is utter nonsense.

Share
twitter facebook
- Re:Laughable Hype (Score:5, Funny)
  
  by avandesande ( 143899 ) writes: on Monday August 21, 2017 @08:37AM (#55056089) Journal
  
  You should start talking with people who don't speak gibberish.
  
  Parent Share
  twitter facebook
  - Re:Laughable Hype (Score:4, Insightful)
    
    by bobstreo ( 1320787 ) writes: on Monday August 21, 2017 @08:42AM (#55056115)
    
    You should start talking with people who don't speak gibberish.
    Yeah, but Mumbai is on the phone with us again...
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by Opportunist ( 166417 ) writes:
    
    Then stop outsourcing to countries where this is the native language.
- Re: (Score:2)
  
  by skids ( 119237 ) writes:
  
  The missing part in this equation is the quality of the "human transcribers". I worked a few mturk transcription microjobs JOOC a decade or so back. Occasionally the job was to validate another person's transcription. It was rather awful. I don't blame them, though, because the pay is rather awful, too, especially for a job that pretty much monopolizes your attention.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- - Re: (Score:2)
    
    by jellomizer ( 103300 ) writes:
    
    That isn't the fault of speech recognition, but context recognition.
    If you had a command prompt and say Open My Files it would do the same thing.
    - Re: (Score:3)
      
      by Luthair ( 847766 ) writes:
      
      No, context recognition would mean the correct word but wrong meaning. Buy and My are clearly distinct words with different pronunciation.
    - Re: (Score:2)
      
      by arth1 ( 260657 ) writes:
      
      If you had a command prompt and say Open My Files it would do the same thing
      My experience with voice recognition is that if I said "open my files", it would interpret that as "reboot now".
    - Re: (Score:2)
      
      by angel'o'sphere ( 80593 ) writes:
      
      Actually it would give an error: 'My' - file not found. 'Files' - file not found.
      Yes, macOS/OS X has an "open" command. And "open 'My Files'" would have worked just fine, supposed you had a folder or file called 'My Files'.
- - Re: Laughable Hype (Score:2, Insightful)
    
    by Anonymous Coward writes:
    
    We have a up to date Microsoft service doing this at my work. Accuracy is a running joke and I regularly forward people their transcriptions so we all get a good laugh. This might be lab quality recordings with limitations on launguage complexity used to cut down on errors. Error rate of a closed set test isnt really a great indicator. Now a year long comparison against several call centers in multiple industries would be quite compelling.
    - Re: (Score:2)
      
      by avandesande ( 143899 ) writes:
      
      I've been getting these voicemail transcriptions from Vonage for years, and although it screws up a lot I get the gist of what the person is saying which is enough. It's certainly better than having to call into a voicemail system.
  - Re:Laughable Hype (Score:4, Insightful)
    
    by Luthair ( 847766 ) writes: on Monday August 21, 2017 @09:57AM (#55056463)
    
    3) How much background noise? Are these from people calling from cell phones. Or a LAN line.
    Why does it matter? If it doesn't function in a standard operating environment then it isn't doing as claimed. What would you say to a watch maker who claimed their product was unscratchable but testing consisted of rubbing it with microfibre cloth?
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by rcharbon ( 123915 ) writes:
      
      If there was speech recognition that was 99.9% accurate for me if I were to stuff my leg down my throat first, get me a bone saw.
    - Re: (Score:2)
      
      by swillden ( 191260 ) writes:
      
      Or a LAN line.
      You mean "land line". The terminology predates IP telephony, and in any case the term clearly cannot apply to local area networks.
      - Re: (Score:2)
        
        by Luthair ( 847766 ) writes:
        
        You replied to the wrong person.... I assumed he meant VOIP personally.
  - Re:Laughable Hype (Score:4, Insightful)
    
    by pr0fessor ( 1940368 ) writes: on Monday August 21, 2017 @11:08AM (#55056865)
    
    3.... I've tried various voice recognition software over the years and can say they are getting much better but if there is any background noise forget it.
    I quit trying to use siri because when I get in the car and ask siri for directions if my wife is with me I get siri saying "I couldn't find, 102 why the fuck street don't you type in the address like a regular shut up person damn it.
    
    Parent Share
    twitter facebook
  - - Re: (Score:3)
      
      by Chaset ( 552418 ) writes:
      
      I just read that as an IP phone connected to the LAN. I have one of those at work. It is theoretically better audio quality than the analog internal phone system it replaced. So cell phone=really bad, LAN line=really good audio quality.
    - Re: (Score:2)
      
      by thewolfkin ( 2790519 ) writes:
      
      Are these from people calling from cell phones. Or a LAN line?
      ROFL
      A LAN line?? WTF is a LAN line?
      or maybe a typographical error that was supposed to say LAND line. I often type other words from what I think and his fingers just auto put LAN instead of land. Or maybe he's young enough to not remember that it's "landline" and after hearing so many people talk about it he assumed the term was lanline. either way hardly a ROFL moment.
- - Re: (Score:2)
    
    by arth1 ( 260657 ) writes:
    
    Like if someone announces they are pregnant will it translate to zÃZo shÃ"ng guÃf zÃ
    That's rather embarazado...
Errors are not Errors (Score:5, Insightful)

by idji ( 984038 ) writes: on Monday August 21, 2017 @07:39AM (#55055861)

When a human transcriptionist makes a mistake you can usually work out what they meant. When Speech-to-text (STT) makes a mistake it is often gibberish. So objectively it is "better" at transcribing, but subjectively much worse.

Share
twitter facebook
- Re:Errors are not Errors (Score:5, Interesting)
  
  by AmiMoJo ( 196126 ) writes: on Monday August 21, 2017 @08:17AM (#55056009) Homepage Journal
  
  Not any more. One of the ways that they got the accuracy up so high is by giving the machine an understanding of English and common phrases, similar to what a human has. It's been used for input correction on smartphones for a while too, e.g. with the Google keyboard it can correct the previous word based on the next one you type if it realizes that they don't make sense together.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by JasterBobaMereel ( 1102861 ) writes:
    
    Unless it actually understands what is being said then it will always make mistakes that result in gibberish
    If they are saying that they have cracked this, then they have strong AI, and should be announcing it to the worlds press ... (they haven't)
    They have added some syntax and grammar rules... just like everybody else ...
    - Re:Errors are not Errors (Score:5, Interesting)
      
      by AmiMoJo ( 196126 ) writes: on Monday August 21, 2017 @09:03AM (#55056203) Homepage Journal
      
      It's more than just syntax and grammar rules. For example, Google has been mining the web for that kind of knowledge. You can see it in Google Translate sometimes. It generates suggestions for your input, and sometimes screws up like thinking "alot" is a word. It also uses colloquialisms in its output, which again it gathered from analysis of the web and which doesn't fit standard grammar or syntax rules.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Chaset ( 552418 ) writes:
        
        A few years ago, a colleague of mine and I were working in Japan. He was writing up a request for a quote and ran it through Google Translate to check his Japanese; expecting to get back an English phrase that at least vaguely corresponded to what he wanted to convey. All I remember was that the output contained the phrase "stormy bedroom". I had no idea how that came from his original text. Anyways, I told him to forget using Google translate.
      - Re: (Score:2)
        
        by arth1 ( 260657 ) writes:
        
        It's more than just syntax and grammar rules. For example, Google has been mining the web for that kind of knowledge. You can see it in Google Translate sometimes. It generates suggestions for your input, and sometimes screws up like thinking "alot" is a word. It also uses colloquialisms in its output, which again it gathered from analysis of the web and which doesn't fit standard grammar or syntax rules.
        Google Translate relies on community suggestions and validation. See https://translate.google.com/c... [google.com]
        The problem is that not everyone who joins there are truly fluent in both languages, nor all that literate.
        
        Re: (Score:2)
        
        by thegarbz ( 1787294 ) writes:
        
        Generally if you crowd source this you end up with a pretty good result. You don't need anywhere near "everyone" to make it work.
    - Re: (Score:3)
      
      by hord ( 5016115 ) writes:
      
      The way the machine learning databases are built, it does understand what is being said. That's why it is so effective. This happens through the connections that are built inside the neural network along with the architecture of the network itself. They are now using context-sensitive data labeling to assign specific meaning to words that are generally ambiguous based on the text around these words. The neural net can learn over time which combinations of words are likely to fall within specific categor
      - Re:Errors are not Errors (Score:4, Interesting)
        
        by djinn6 ( 1868030 ) writes: on Monday August 21, 2017 @02:04PM (#55058017)
        
        The way the machine learning databases are built, it does understand what is being said.
        I think the word "understand" has a more general meaning than what you wrote later on. For it to understand what was being said, beyond making grammatical sense of the sentence, it needs to know the abstract concepts behind the words and be able to manipulate them.
        
        For example:
        Jeff is a software engineer, Kate is a software engineer, and Larry is also ...
        Can you finish the sentence?
        
        Most humans could do it with a high degree of accuracy. Some might even find the obvious answer so boring that they try for a more creative one. However, ML is still very far from that.
        
        Since it does not grasp the abstract concepts, its transcription is much more likely to lose meaning than a human transcriber. When talking about network technology for example, a human will not mis-transcribe "NAT" to "gnat", while a machine will.
        
        Parent Share
        twitter facebook
- Re:Errors are not Errors (Score:5, Informative)
  
  by jellomizer ( 103300 ) writes: on Monday August 21, 2017 @08:22AM (#55056043)
  
  Normally we have transcriptionist who are trained in a particular area to understand the context of the message. A legal transcriptionist requires different training then a Medical Transcriptionist.
  
  Parent Share
  twitter facebook
  - Re:Errors are not Errors (Score:5, Insightful)
    
    by gnick ( 1211984 ) writes: on Monday August 21, 2017 @09:39AM (#55056359) Homepage
    
    A legal transcriptionist requires different training then a Medical Transcriptionist.
    And sometimes even that training falls short. Does anyone remember the explosion at WIPP when the tech transcribed "an organic kitty litter" instead of "inorganic kitty litter"?
    Kitty litter explosion. [cnbc.com]
    
    Parent Share
    twitter facebook
    - - I guessed a search would tell about that. Found it (Score:2)
        
        by Futurepower(R) ( 558542 ) writes:
        
        The Miracle of Hiroshima -- Jesuits survived the atomic bomb thanks to the rosary [catholicnewsagency.com]
- Re: (Score:2)
  
  by Billly Gates ( 198444 ) writes:
  
  Just keep it recorded and have a human review it.
  This could cut costs greatly with this automation if it is true. Why pay 50 transcribers when you can pay 1 for a reduced wage since demand will now be lower and have the computer do the work for free?
  - Re:Errors are not Errors (Score:4, Interesting)
    
    by hord ( 5016115 ) writes: <jhord@carbon.cc> on Monday August 21, 2017 @11:14AM (#55056887)
    
    I'm not a statistician but it's possible that once you can prove that the neural network can produce answers at a success rate higher than humans you would be introducing error by allowing humans to review it. I'm not saying it shouldn't be done but this is one of the weird questions that people will have to ask on a case-by-case basis as these technologies are applied to real problems.
    
    Parent Share
    twitter facebook
- - Re:Errors are not Errors (Score:4, Insightful)
    
    by K. S. Kyosuke ( 729550 ) writes: on Monday August 21, 2017 @09:22AM (#55056289)
    
    Hey, it's going to cost $700 per minute but at least there will be no errors!
    So it's about three times cheaper than the lawyer that you'd need if you get sued for a bad transcription?
    
    Parent Share
    twitter facebook
    - Re: (Score:3)
      
      by SeattleLawGuy ( 4561077 ) writes:
      
      Hey, it's going to cost $700 per minute but at least there will be no errors!
      So it's about three times cheaper than the lawyer that you'd need if you get sued for a bad transcription?
      This will eventually bring down the costs of lawsuits by making court reporters less common, but that may take a few decades.
      Not many lawyers are $700 per minute. Even $700 per hour is rare.
      And do you know how much we have to pay to go through law school and have our senses of humor surgically removed?
  - Re: (Score:2)
    
    by Dripdry ( 1062282 ) writes:
    
    No amount of money can ameliorate sending your mother-in-law,"Ask her why her penis caught in her dress" instead of "Ask her why her pin is stuck in her dress"
    None.
    Ever.
- - Re: (Score:2)
    
    by Opportunist ( 166417 ) writes:
    
    Who is moving goalpost. The goalpost is "write what I said", and that didn't move an inch.
Using it to post on slashdot (Score:5, Funny)

by Harald Paulsen ( 621759 ) writes: on Monday August 21, 2017 @07:49AM (#55055901) Homepage

holyfield is these all of this was made worse by the fact that i had these birds skilled estimate uh... supplying itself what's your special prom to prevent fraud reform
thoughtfulness julia roberts police comments entry drug connections predicting that nighttime beating

Share
twitter facebook
Comment removed (Score:5, Interesting)

by account_deleted ( 4530225 ) writes: on Monday August 21, 2017 @08:07AM (#55055955)

Comment removed based on user account deletion

Share
twitter facebook
- Re: (Score:2)
  
  by peragrin ( 659227 ) writes:
  
  The recognition system is 5.9% accurate for the testers. For the rest of us it is far far off. Human testers are 5.9% accurate across a much larger selection of people.
  I can't use voice recognition to send a text without 3-5 attempts. And I don't have a hard accent.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
  - Re: (Score:3)
    
    by ranton ( 36917 ) writes:
    
    I can't use voice recognition to send a text without 3-5 attempts. And I don't have a hard accent.
    It is very odd that you have such a low success rate with voice recognition. At least 2/3 of my voice texts can be sent without editing, and most of the errors have to do with proper names. Are you sure you don't have an accent? My wife mumbles pretty bad when talking fast (so bad I don't like talking with her on the phone most of the time) but even she has a pretty easy job using voice to text now. It was pretty bad a few years ago but it really is amazing how much better it has become.
  - Re: (Score:2)
    
    by arth1 ( 260657 ) writes:
    
    I can't use voice recognition to send a text without 3-5 attempts. And I don't have a hard accent.
    I can't get voice controlled phone systems to work.
    The main problem is that I have a deep voice, and these systems are built on the pareto principle - cutting off the 20% with the deepest or highest voices is considered acceptable. I refuse to squeak to be understood.
    Some of the phone systems have hardcoded that if you say "human" or "operator", it will take you to a human operator. The problem is that it doesn't recognize those keywords either. After the aggressive high pass filter on the voice recognit
- Re:Bad experiences on this front (Score:4, Interesting)
  
  by Baron_Yam ( 643147 ) writes: on Monday August 21, 2017 @08:21AM (#55056037)
  
  5.9% means it still gets more than 1 in 20 things wrong. That's a LOT when you're feeding the information into a system that requires pretty much a 0% error rate.
  Second, there's a huge difference between standard language and specialist syntax. With programming, you're likely going to want a LOT of special formatting that you can type without thinking but it's cumbersome to communicate via speech in a way that won't confuse a speech recognition engine.
  And finally - so long as they don't have a related disability - a proficient typist can already type about as fast as they can form decent code in their head. With a bit of 'mousework' for selection and cut-and-paste I don't see speech ever becoming the superior entry method unless and until we have genuine AI that understands your intent rather than your words.
  It might be nice to use speech as a macro-invoker, though.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
  - Re: (Score:2)
    
    by GrumpySteen ( 1250194 ) writes:
    
    With programming, you're likely going to want a LOT of special formatting that you can type without thinking but it's cumbersome to communicate via speech in a way that won't confuse a speech recognition engine.
    This story is about speech recognition being as good as transcription services. Programmers don't dictate their code verbally to be transcribed into text format by someone else, so that is a really weird thing to try to use as a counter argument.
    - Re: (Score:2)
      
      by Baron_Yam ( 643147 ) writes:
      
      >Programmers don't dictate their code verbally to be transcribed into text format by someone else, so that is a really weird thing to try to use as a counter argument
      Yet my post was in response to someone attempting to program by dictation, so somehow it seems completely relevant.
  - Re: (Score:2)
    
    by angel'o'sphere ( 80593 ) writes:
    
    Speech is 4 or 5 times slower than typing.
    So unless you can tell an IDE "look at package 'my.product.model' and 'my.product.entities', create a Factory based on ctor signatures for all 'entities' that implement interfaces from 'models' and return 'model' classes" voice input is pretty pointless. And I doubt an 'AI' will be able to do that soon, while my template based code generator does that instantly. But I start it with a mouse click (which is slower than a key board short cut, obviously).
- Re: (Score:2)
  
  by jellomizer ( 103300 ) writes:
  
  Writing code by voice? Are you insane.
  Speech is portraying ideas in a liner fashion. Coding you are jumping up and down filling different parts of the problem. At different time.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
  - Re: (Score:2)
    
    by angel'o'sphere ( 80593 ) writes:
    
    Coding you are jumping up and down filling different parts of the problem
    Erm, if you meant me with "you", then err, no!!
    I just write my code top down.
- Re: (Score:2)
  
  by ziggystarsky ( 3586525 ) writes:
  
  The reported error rate is for conversational English. This means that you cannot throw meaningless words at it. Modern speech recognition exploits grammatical and semantical structure. The stock recognizers can't do this for programming languages. You could train the model on a programming language, and certain constructs (like brackets, if-then-else) will see an improvement in recognition.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- Re: (Score:2)
  
  by skovnymfe ( 1671822 ) writes:
  
  Do you have access to internal MS Research software? Cool, bro. Can you hook me up with some access too? Because you must've used the internal MS Research software to do your anecdotal testing some months ago, since you've got an opinion on how good it is at doing its job.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Re: (Score:2)
      
      by angel'o'sphere ( 80593 ) writes:
      
      The parent was just bad in natural language transcription into internal (mind) symbols and constructed a completely different meaning from your words than you intended.
- - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by SQLGuru ( 980662 ) writes:
        
        All of the Speech Recognition software that you commonly use is geared toward conversational language. You could create one that follows the language and grammar of code, but it would require different training. Consider the search suggestions you get when you type in the search bar.....that's how Speech Recognition works. Based on the previous words, it creates a list of likely next words and then determines which one matches the spoken words. When I type "void" into Google, it suggests to me the follo
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by SQLGuru ( 980662 ) writes:
        
        Did you build your own grammar? And variables needn't be a bunch of cryptic jibberish -- use meaningful variable names so your code is more readable; so real English words.
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by angel'o'sphere ( 80593 ) writes:
        
        You are obviously not a programmer.
        'invoiceAddress', 'invoiceName', or other artificial words are used in programming.
        The speech recognition would interpret it as 'invoice address' and 'invoice name', hence the program would be 'broken'.
        Other examples are abbreviations, like fis (FileInputStream) for a variable name or fos (FileOutputStream). However I would assume a speech recognition software would be able to understand eF Eye eS.
        
        Re: (Score:2)
        
        by SQLGuru ( 980662 ) writes:
        
        Actually, I am a programmer. And it seems like he never bothered to give the speech engine the grammar of the programming language (the Backus-Naur form syntax). Programming languages are very prescriptive in where things go -- unlike English where some words can vary in their location.
        With the BNF defined, it should be easier to determine that "variable goes here" and the software could look for previously identified variable tokens to assist in the interpretation --- that's basically how IntelliSense wo
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
  - Re: (Score:2)
    
    by parkinglot777 ( 2563877 ) writes:
    
    a) This is about real human launguages [sic], not programming languages b) You wern't [sic] using the speech recognition software that this is talking about
    I would agree with point "a" but I would agree with point "b" only under certain conditions.
    If a software is specifically for speech and has built-in functionalities in attempt to auto-correct words to fit a context sentence, then I agree. However, if it simply transcribes a speech, then it can be used to do anything and does not need to be related to just a conversation recognition.
- - Re: (Score:3)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
- - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
"As Accurate As Professional Transcribers" (Score:5, Funny)

by Anonymous Coward writes: on Monday August 21, 2017 @08:08AM (#55055967)

"As Accurate As Professional Transcribers..."
They left out "from Uzbekistan transcribing Navajo - underwater".
Never trust anything Clippy say.

Share
twitter facebook
- Re: (Score:2)
  
  by skids ( 119237 ) writes:
  
  They left out "from Uzbekistan transcribing Navajo - underwater".
  and "...working on cell phones with auto-correct enabled"
Curious what the results are on modern hardware (Score:2)

by DraconPern ( 521756 ) writes:

They should do tests using modern hardware. For example the speech recognition on iOS seems to be pretty good. If they can get this technology into windows 10 that would be awesome. Oh I dictated this using iOS.
- Re: (Score:2)
  
  by religionofpeas ( 4511805 ) writes:
  
  You talk in a typewriter font ?
- Re: (Score:2)
  
  by qbast ( 1265706 ) writes:
  
  I tried the same paragraph on iPad - flawless recognition. Then on Windows 10 - this is the result: "He shouldn't have been tested in what example dispute recall he can get distinctive into windows and that would be on a dichotic and"
NSA (Score:2)

by Dan East ( 318230 ) writes:

The NSA would love this. Keyword scanning of 95% of what's spoken in phone conversations (given enough processing power to transcribe them all).
Yeah (Score:2)

by Dunbal ( 464142 ) * writes:

Just make sure you run it on an air gapped computer if you want your conversation to remain private.
Comically inaccurate (Score:2)

by Larry_Dillon ( 20347 ) writes:

At work we have an cloud-based Outlook that transcribes voicemail to text. It's so comically inaccurate that we sometimes forward the results to the sender and we both get a good laugh.
Microsoft Speech Recognition Now As Accurate - Say (Score:3, Interesting)

by WeBMartians ( 558189 ) writes: on Monday August 21, 2017 @09:15AM (#55056259)

If it can recognize "It's difficult to wreck a nice beach", I'll be thoroughly 'whelmed'.

Share
twitter facebook
"Show me to buy milk at this opportunity."anyone? (Score:2)

by itsme1234 ( 199680 ) writes:

The lameness filter is lame.
How does it do with... (Score:2)

by judoguy ( 534886 ) writes:

IgPay AtinLay?
- Re: (Score:2)
  
  by PoopJuggler ( 688445 ) writes:
  
  "Dear aunt, let's set so double the killer delete select all."
In which environment? (Score:3)

by Opportunist ( 166417 ) writes: on Monday August 21, 2017 @09:50AM (#55056421)

In a sound proof studio built for sound recording spoken by someone with speech training?
Or in an environment with 30 people talking in the background, an air condition running, doors and drawers slamming, people laughing, feet
and chairs shuffling across the floor, some photocopiers that got their last service before Bush left office whining for hours and a person speaking into the phone while at the same time talking to coworkers and you're expected to know which words belong to you and which ones are directed at someone else?
Aka "open plan office".

Share
twitter facebook
On the down side (Score:3)

by fahrbot-bot ( 874524 ) writes: on Monday August 21, 2017 @10:29AM (#55056621)

It still showed up at the South Park "Save Films from their Directors" club for the wrong reason when it heard, "Free Hat" [wikipedia.org].
(For those that aren't South Park followers...)
Cartman writes "Free Hat" on the advertising poster in the belief that freebies are necessary to attract people. However, the crowd mistakenly thinks the rally is to free Hat McCullough, a convicted baby killer they believe was innocent.
Now thinking that "Free Hat" would be a great name of one of those Windows App Store pirate streaming apps [slashdot.org] ...

Share
twitter facebook
The acid test (Score:2)

by John Jorsett ( 171560 ) writes:

Will it transcribe, "Diffused the situation," or "Defused the situation"? Every single TV closed-caption I've ever seen, and I've taken special note since I first became aware of this, has gone with the former. And those presumably have been humans making that error.
Hype, more hype, and maybe outright lies (Score:3)

by Rick Schumann ( 4662797 ) writes: on Monday August 21, 2017 @11:53AM (#55057063) Journal

If you believe Microsoft without independent verification from an otherwise uninterested third-party who has no investment in the outcome, then you're a fool.

Share
twitter facebook
5% (Score:3)

by MMC Monster ( 602931 ) writes: on Monday August 21, 2017 @12:43PM (#55057467)

One in 20 words is wrong?
How can a human transcriptionist be that bad?

Share
twitter facebook
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  It is not. Sure, humans get a word wrong, but they will only very rarely mangle the meaning. Machine transcription, on the other hand, will often get meaning wrong and that is a serious problem.
  The only thing this shows is use of an unsuitable (in fact, utterly stupid) metric for marketing purposes.
Nonsense (Score:2)

by gweihir ( 88907 ) writes:

Humans transcribers "have the advantage to be able to listen to the recording several times"? What utterly demented nonsense is that? Of course, the software, having the recording, can "listen" to it as often as it wants. There is absolutely no "advantage" here for the human transcribers.
Tad misleading (Score:2)

by Oligonicella ( 659917 ) writes:

puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times
As if the audio sails by the program and isn't stored in memory and parsed as many times as needed.
Works form me (Score:2)

by mnemotronic ( 586021 ) writes:

I fuse micro sot noise recognition ball the time it words fall Leslie.
ROTFLMAO!!! (Score:2)

by whitroth ( 9367 ) writes:

It is? And who decided *that*?
We've got it on our hybrid phones. At least half the time, the voice transcription "preview" resembles, randomly, Vogon poetry, or perhaps only "computer poetry" from 40 years ago. It rarely gets a name or title correct, and the message they're trying to leave, *maybe* 50% is close enough to guess what they meant, without listening to the mp3.
- Re: (Score:2)
  
  by lobiusmoop ( 305328 ) writes:
  
  That's from over 10 years ago, which in computing terms is ancient history.
- - Re: (Score:2)
    
    by Opportunist ( 166417 ) writes:
    
    We have arrived at the point where assuming that a company wants to invade your privacy is pretty much the default position.
- - Re: (Score:2)
    
    by jabuzz ( 182671 ) writes:
    
    Given that transcription is not a highly paid area, and that a moderate typist can transcribe pretty much as fast as as you talk, there is not a chance in hell you can fire 10 transcribers and hire two.
    However this is 2017, there is no need to have your transcription service in central London for example. Punt the audio file to somewhere else over the internet. It doesn't need to even leave the UK to be much cheaper than being in central London either.
    In fact this is perfect for homeworking to be honest. Es
    - - Re: (Score:2)
        
        by jabuzz ( 182671 ) writes:
        
        Because court transcribers are less than 0.1% of people doing transcription, that's why. No idea what it's like in the USA, but in the UK the NHS does not pay at that level for medical transcription services, and top law firms don't either. I very much doubt the court transcribers get paid that much either. I will however ask my brother (aka a real life Judge) what they get paid next week when I see him. However a quick google suggests 60GBP per a 5.5 hour day sitting after which overtime kicks in but rarel

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Laughable Hype (Score:5, Interesting)

Re:Laughable Hype (Score:5, Funny)

Re:Laughable Hype (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: Laughable Hype (Score:2, Insightful)

Re: (Score:2)

Re:Laughable Hype (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Laughable Hype (Score:4, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Errors are not Errors (Score:5, Insightful)

Re:Errors are not Errors (Score:5, Interesting)

Re: (Score:2)

Re:Errors are not Errors (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Errors are not Errors (Score:4, Interesting)

Re:Errors are not Errors (Score:5, Informative)

Re:Errors are not Errors (Score:5, Insightful)

I guessed a search would tell about that. Found it (Score:2)

Re: (Score:2)

Re:Errors are not Errors (Score:4, Interesting)

Re:Errors are not Errors (Score:4, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Using it to post on slashdot (Score:5, Funny)

Comment removed (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re:Bad experiences on this front (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

"As Accurate As Professional Transcribers" (Score:5, Funny)

Re: (Score:2)

Curious what the results are on modern hardware (Score:2)

Re: (Score:2)

Re: (Score:2)

NSA (Score:2)