Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Microsoft

Microsoft Shows Off Adaptive, Multilingual Text to Speech System 171

MrSeb writes about a really cool project from Microsoft's speech research group. From the article: "Microsoft Research has shown off software that translates your spoken words into another language while preserving the accent, timbre, and intonation of your actual voice. In a demo of the prototype software, Rick Rashid, Microsoft's chief research officer, said a long sentence in English, and then had it translated into Spanish, Italian, and Mandarin. You can definitely hear an edge of digitized 'Microsoft Sam,' but overall it's remarkable how the three translations still sound just like Rashid. The translation requires an hour of training, but after that there's no reason why it couldn't be run in real time on a smartphone, or near-real-time with a cloud backend. Imagine this tech in a two-way setup. You speak into your smartphone, and it comes out in their language. Then, the person you're talking to speaks into your smartphone and their voice comes out in your language." The Techfest 2012 keynote has a demo of the technology around minute 13:00.
This discussion has been archived. No new comments can be posted.

Microsoft Shows Off Adaptive, Multilingual Text to Speech System

Comments Filter:
  • AZN (Score:2, Insightful)

    by willie3204 ( 444890 ) on Monday March 12, 2012 @10:15PM (#39334737)

    Japanese please!!!!

  • by ChatHuant ( 801522 ) on Tuesday March 13, 2012 @12:08AM (#39335511)

    That said, I don't regret learning Spanish, but learning it just so you can get a cheaper tourist trap is not worth it at all.

    Of course it's not worth it, if all the benefit you find in knowing another language is saving a couple of bucks at some touristy place. But knowing a different language is much more than that. You have now access to new worlds of literature, movies, poetry and music first hand, without a translator to intermediate (because, as the Italians say, "traduttore, traditore"!). You can talk to more people directly, understand their culture, expand your mind. You can read a whole set of new web sites, see different perspectives, or read news that aren't easily available otherwise. It opens lots of new possibilities for you - for example if you want to work for a global company, or if you ever feel like work in a different country for a few years. And even without any of those, the very effort of learning a different language improves your brain and slows mental aging.

    I'm relatively fluent in three languages now, and can more or less read another two. I read books in all of them, and I find it really enriches my mind. I just started learning a fourth (Japanese), and am really looking forward to reading Japanese books in their original form (even though learning enough of the kanji characters will be a pain).

  • by Phics ( 934282 ) on Tuesday March 13, 2012 @12:23AM (#39335575)

    It's not garbage, and if they had real innovations, it would be nice. Instead, they've taken a few characteristics of a speaker, like pitch, and used those to model the computer voice in another language.

    No, if you listened to the keynote, they took speech characteristics, and then broke the target voice pattern up into 5ms pieces and reconstructed the voice to match a reference translation from a different language. What they are doing is not only very interesting, but clearly has space for improvement and a variety of applications.

    It's about as interesting as if someone said, "what would you look like if you were a boy?" (or girl, if you are male), and then sampled your eye color, hair length, nose shape, etc, and then morphed those into a stock photo of a boy. Yeah, it would have some characteristics of you, but it also wouldn't be what you would look like if you were a boy.

    That's sort of the point. The sampled voice may not speak fluent Mandarin, but if you'd like it to, this technology will allow it to. A better analogy would be along the lines of taking a computerized sample of your body shape and texture, (skin, hair, face, etc), and then using 3D animation to reconstruct a model of you doing karate, even if you didn't actually know karate.

    Eventually, as the 'resolution' improves, the bits of this that you disapprove of, (the computerized feel you are getting from the voice), will most certainly improve as well. But it's the underlying ideas and tech which are interesting here.

  • by NoKaOi ( 1415755 ) on Tuesday March 13, 2012 @12:44AM (#39335697)

    He's selling a great idea, but it's kind of like the Fountain of Youth. It ain't there, vaporware.

    Is he actually trying to sell a mature product, or is he just showing something cool? I'm not sure where the innovation is, if it's in being able to train text-to-speech to sound like your voice, preserving intonations and such across the translation (even though it's obviously not great at it yet), or if it's just in putting a few existing technologies together, but you have speech recognition, and a translator, and text to speech that sounds like your voice, then this is what you can have. Include preserving the intonation and you have something cool. So what if it's just showing off a cool application of existing technologies?

    Translators aren't great but are getting better...speech recognition isn't great but is getting better. Preserving intonation across the translation and including in text-to-speech in a voice that sounds kinda like your own can probably get better too. Put the 3 together and you get something useful. I think that's all it's trying to show, and I think as these technologies get better we could end up with something pretty cool.

    If this was a something out of any other company, would the same people be criticizing it?

  • by tenco ( 773732 ) on Tuesday March 13, 2012 @06:31AM (#39336861)

    ... if only my software could translate a bytestream of type video/x-ms-asf into a video.

    In light of this experience, why should i believe that someone actually invented a unidirectional universal translator? Nice try.

Say "twenty-three-skiddoo" to logout.

Working...