Slashdot is powered by your submissions, so send in your scoop


Forgot your password?

Microsoft Shows Off Adaptive, Multilingual Text to Speech System 171

MrSeb writes about a really cool project from Microsoft's speech research group. From the article: "Microsoft Research has shown off software that translates your spoken words into another language while preserving the accent, timbre, and intonation of your actual voice. In a demo of the prototype software, Rick Rashid, Microsoft's chief research officer, said a long sentence in English, and then had it translated into Spanish, Italian, and Mandarin. You can definitely hear an edge of digitized 'Microsoft Sam,' but overall it's remarkable how the three translations still sound just like Rashid. The translation requires an hour of training, but after that there's no reason why it couldn't be run in real time on a smartphone, or near-real-time with a cloud backend. Imagine this tech in a two-way setup. You speak into your smartphone, and it comes out in their language. Then, the person you're talking to speaks into your smartphone and their voice comes out in your language." The Techfest 2012 keynote has a demo of the technology around minute 13:00.
This discussion has been archived. No new comments can be posted.

Microsoft Shows Off Adaptive, Multilingual Text to Speech System

Comments Filter:
  • by cptdondo ( 59460 ) on Monday March 12, 2012 @11:16PM (#39335177) Journal


    I am bilingual in English and another language. When I go to that country, many of the tourist attractions have price lists in English, Spanish, Russian, Japanese, you name it. Then they have one in the local language. The prices on that one are half of what they are for the tourists. And they're written out in words, not numbers, so if you can't read them you're SOL.

    So yup, you don't need to speak the other guy's language, if you're willing to play by his rules.

  • by phantomfive ( 622387 ) on Tuesday March 13, 2012 @12:40AM (#39335685) Journal

    I just started learning a fourth (Japanese), and am really looking forward to reading Japanese books in their original form (even though learning enough of the kanji characters will be a pain).

    Might want to check out this book [], it is good. And since I'm giving completely unsolicited advice, the exposition of grammar in "Communicating with Japanese by the Total Method" is my favorite of all language textbooks I've seen.

  • by malakai ( 136531 ) on Tuesday March 13, 2012 @12:48AM (#39335723) Journal

    . I foresee someone attempting a friendly gesture by offering to share her mother's recipe for "shut up."

    Context is context. Obviously, an English speaker hearing a Spanish speaker offer to share a recipe for "shut up" on a (up until this point) benign and friendly conference call is going to assume translation error. Better than that, translation software knows about these little mix ups better than you do. On a Text To Speech, there's not much to do but suffer the mis-translation ( or maybe they play an audble 'ping' when they warn about a context or idiosyncrasy error), but in a system that displays you something on a device, these things tend to be shaded a different color, and offer options as to what other possible meaning they may have meant, based on context.

    One, our text translation software isn't foolproof, but people expect it to be.

    No, they don't. No one even expects paid human translators to be perfect.

    Two, live conversations depend upon both parties building on a shared experience. If each one has a different account of the experience, conversations break down very quickly. Ever tried to carry on a conversation with a schizophrenic?

    Honestly, with a schizophrenic, chances are I have, at some point in my life, on IRC. But more to your point, i've played games where opposing sides are communicating from different languages via google translate. Think Russia vs US, and the only way to talk to them is via delayed google translate results. It's slow, it's tedious, and yet we somehow managed to have amazing rapport with people of like mind. The assholes were still assholes via google translate, and the people we wanted to work with we managed to communicate with. Again, you are ignoring the fact than incrementally better translation is still better than it's predecessor. For now. Sure, one day we'll identify some uncanny valley with voice translation, and we'll all spend lots of time plotting how bad the translation software has to be for us to feel it's robotic.... but for now, any small step forward is better than the previous one.

    Then again, this whole discussion is purely academic. Gene Roddenberry's estate will just claim prior art [] and prevent this from ever becoming a reality. Hopefully.

    Yup, god forbid someone spends time and money on a problem that sci-fi writers got to magically make disappear in one sentence, and a prop. Maybe someday some brilliant young chap will figure out how to make warp drive not require 3x the mass of the universe for power, and Gene's children can make some more cash. Hopefully.

  • by Gadget_Guy ( 627405 ) * on Tuesday March 13, 2012 @03:57AM (#39336419)

    They sell Microsoft Office for operating systems other than Windows.

    This concession to the antitrust authorities and Apple is something of an exception to the general rule and it was a brutal fight to make it come about.

    What rubbish! The first version of Microsoft Office EVER was for the Mac in August 1989. The Windows release came out in November 1990. With whom did they have this "brutal fight" to get this released for the Mac?

    Interestingly, according to Wikipedia [], after the release of Word for the Mac in 1985 (2 years after Word for MS-DOS and Xenix), "Word for Mac's sales were higher than its MS-DOS counterpart for at least four years". It seems that Microsoft were rather pragmatic about selling software where it would make a buck!

  • by msclrhd ( 1211086 ) on Tuesday March 13, 2012 @04:49AM (#39336565)

    Provided that the speech recognition engine is good enough, it can distinguish between the /Q/ and /A/ sounds in lot (British English: /lQt/, General American English: /lAt/), cot, hot, etc, with /A/ also appearing in father /fA:D@/. This will mean that the speech recognition engine will record the actual phonemes spoken, rather than the phonemes it thinks are being spoken. With this, it can then build up a database of phonemes to the recorded audio.

    When a given language is selected (strictly speaking it is a language + accent, as Liverpudlian English sounds different to Australian English and Mexican Spanish sounds different to Argentinian Spanish) it will have a set of rules that describe how to convert the text into phonemes specific to that accent (for example, "ook" is usually pronounced /Vk/ in English, but in Scouse English it can be /Vx/). These rules provide a set of phonemes required by the language+accent to speak it properly.

    The phonemes are transcriptions of IPA-based phonemes ( If you plot the phonemes available by the voice on the phoneme charts, you can fill in more phonemes that are similar (e.g. using /A/ instead of /Q/ if the voice does not support /Q/, or an untrilled /r/ if the trilled version is not supported, where a trilled /r/ can be found in Spanish).

    Then, provided that the voice can handle all the phonemes in a language+accent, you can then map between the two, allowing your English speaking voice to speak German, Chinese, Afrikaans or whatever language you have data for. The eSpeak text-to-speech program does a simple version of this to make the German, Polish, Swedish, Romanian, Dutch, Hungarian, French and Afrikaans MBROLA voices speak English.

    You can also use it to have a voice support different accents, provided you have the rules for producing the correct phonemes.

  • by symbolset ( 646467 ) * on Tuesday March 13, 2012 @05:10AM (#39336631) Journal

    The selective memory of you 'softie fans is amazing. There's a reason for these things. In 1986 Windows looked like this []. Sales of Mac Office kept Microsoft alive in this period. Microsoft Office was moved to reinforce Windows as soon as Windows was a credible environment. Windows wasn't even a credible platform until Windows for Workgroups (Windows 3.11) was released in November 1993, some 7 years later (or 1/3 of the time to present day). Mac Office was so lagging for a long while after WfW launch that it was effectively discontinued, and Office's superior support of the Windows platform was a huge part of Windows assuming dominance over the superior Mac OS which had come to rely on Office, which now offered degraded inferior performance and features on the Mac OS. There were some other shenanigans you can read about in the above links. It was a very successful strategy you can read more about here [] - enough horrifying content to keep you awake for years. But if that's not enough, you might try these []. Microsoft through these lessons evolved a strategy where all their products have to reinforce each other, and that became their core strategy. And then...

    Apple got some traction in their TrueType font rendering patent suit against Microsoft [] and the Justice department was closing in on an antitrust action [] legendary in its scope and reach. Bill Gates blinked, and they settled, and now there's Mac Office, but you can't say that it's fully supported. The Mac versions lag the Windows versions by some years and are not fully compatible with each other in ways that can't be explained by OS platform differences. The Office platform supports Windows now, as you can see by all the sockpuppets who come out every time somebody mentions some non-Windows operating system to say "you can't get Microsoft Office for that and you never will." And then the rest of us chime in "Application vitualization solves that problem."

    Eventually Microsoft discovered political advocacy and contributed in various ways to the installation of a government more supportive of their business activities. Then the enforcement of antitrust protections to limit them and protect us against their abuse of their monopoly became lax, the limits were quashed until those protections expired. But that's another long story for another day.

BLISS is ignorance.