OpenAI Reveals AI Tool To Recreate Human Voices (axios.com) 24

Posted by BeauHD on Friday March 29, 2024 @03:05PM from the what-could-possibly-go-wrong dept.

An anonymous reader quotes a report from Axios: OpenAI said on Friday it's allowed a small number of businesses to test a new tool that can recreate a person's voice from just a 15-second recording. The company said it is taking "a cautious and informed approach" to releasing the program, called Voice Engine, more broadly given the high risk of abuse presented by synthetic voice generators.

Based on the 15-second recording, the program can create a "emotive and realistic" natural-sounding voice that closely resembles the original speaker. This synthetic voice can then be used to read text inputs, even if the text isn't in the original speaker's native language. In one example offered by the company, an English speaker's voice was translated into Spanish, Mandarin, German, French and Japanese while preserving the speaker's native accent.

OpenAI said Voice Engine has so far been used to provide reading assistance to non-readers, translate content and to help people who are non-verbal. It said the program has already been used in its text-to-speech application and its ChatGPT Voice and Read Aloud tool. "We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities," the company said. "Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale."

OpenAI Reveals AI Tool To Recreate Human Voices

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 24 Comments Log In/Create an Account

Comments Filter:

Late to the party (Score:4, Insightful)

by systemd-anonymousd ( 6652324 ) writes: on Friday March 29, 2024 @03:06PM (#64354230)

11labs (paid) and RVC (free, local) already do this, and the latter without censorship

- ... and down to clown. (Score:3)
  
  by Gravis Zero ( 934156 ) writes:
  
  11labs (paid) and RVC (free, local) already do this, and the latter without censorship
  11labs needs a minimum of 30 minutes of speech while RVC needs several minutes of a recorded speech. 15 seconds is a huge difference and means they are not depending on the voice making every type of sound/transition and then trying to recreate that.
  If they can realistically produce a similar sounding voice using data measured in seconds then it's certain that they are using a fundamentally different approach than existing systems.
  - Re: (Score:2)
    
    by systemd-anonymousd ( 6652324 ) writes:
    
    >11labs needs a minimum of 30 minutes of speech while RVC needs several minutes of a recorded speech.
    That's wrong.
    "Short on time? No worries. Even brief audio snippets can be effective for generating a reliable voice clone." https://elevenlabs.io/voice-cl... [elevenlabs.io]
    I had good results with 11labs with less than a minute of audio.
    RVC needs a few minutes but there are forks that work with less than a minute, like Coqui, with 6 seconds: https://huggingface.co/coqui/X... [huggingface.co]
- Re: (Score:2)
  
  by Travelsonic ( 870859 ) writes:
  
  My favorite for cartoon character voices was 15ai (still down over a year and (and a half?) later), since you could do text to speech in the voice of many cartoon characters with different tones/inflections for different emotions. *le sad sigh*
  - Re: (Score:2)
    
    by systemd-anonymousd ( 6652324 ) writes:
    
    15ai was killed by the sole dev's insanity and commitment to being as closed as possible, and funneling people through Patreon. He also was basically running RVC and Tortoise, but the real quality was just finely curated datasets. It became a grift, and as of 6 months ago he was getting $500/mo. from his Patrons while releasing just teases. You can search "site:reddit.com 15ai" and learn more.
    - Re: (Score:2)
      
      by Travelsonic ( 870859 ) writes:
      
      The "as closed as possible" bit, ngl, always bothered me (since I believe tech like this should be open, moreso knowing the foundation here is essentially built on OSS). I wonder, perhaps a stupid question, does this mean that if someone really, REALLY wanted to, they could (with relative ease) start up their own 15ai type deal, with most of the same things that made it attractive in the first place?
      - Re: (Score:2)
        
        by systemd-anonymousd ( 6652324 ) writes:
        
        Certainly they could, and people have done that
        
        Re: (Score:2)
        
        by Travelsonic ( 870859 ) writes:
        
        Irrespective of anything else, I definitely wanna install Torotise + RVC now to experiment with - hopefully my stupid ass can figure it out.
YouTubers already using something similar (Score:2)

by Powercntrl ( 458442 ) writes:

There I Ruined It has been using some AI tool to make song parodies to absolutely hilarious results. I think my latest favorite has to be the Bro Country Song. [youtube.com] I'm kind of disappointed someone hasn't made a Cybertruck commercial parody using it as the background music.
In general, I'm a fan of AI (Score:4, Insightful)

by MpVpRb ( 1423381 ) writes: on Friday March 29, 2024 @03:26PM (#64354304)

But WHY do the AI companies keep releasing stuff that allows evildoers to more easily do evil?
We need tools for drug development, xray reading, automated software security scans, improved analysis of physics data and lots more beneficial stuff
Meanwhile, the AI companies seem focused on making tools for scammers and worse

- Re: (Score:2)
  
  by Iamthecheese ( 1264298 ) writes:
  
  It's impossible for a tool to be useful without also being dangerous. It is easier to destroy than to build.
- In other news, knife companies keep making knives (Score:3)
  
  by Press2ToContinue ( 2424598 ) writes:
  
  allowing people to stab each other. Details at 11.
- Re: (Score:2, Informative)
  
  by gweihir ( 88907 ) writes:
  
  You are under a misapprehension here: You seem to think AI companies are run by good people who would have honor, integrity and a desire to positively contribute to the human endeavor. That is not the case.
- That makes no sense at all. (Score:2)
  
  by Press2ToContinue ( 2424598 ) writes:
  
  You nailed it.
"Native accent"?? (Score:3)

by zooblethorpe ( 686757 ) writes: on Friday March 29, 2024 @03:45PM (#64354386)

"... an English speaker's voice was translated into Spanish, Mandarin, German, French and Japanese while preserving the speaker's native accent."

What?
So the English speaker's original UK Received Pronunciation accent is preserved after translation into Japanese? How the flippety fuck does that work?
I even went the extra step and read the linked article [axios.com] itself. Same text, for this piece, with no additional information.
I rather suspect that they are misusing the word "accent" here.
(Written from the perspective of a professional translator and occasional interpreter for English and Japanese.)

- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  I suppose it just means they would give the english speaker a generic english accent in the generated japanese. Maybe you can even dial in how much accent your pretend-japanese-speaking-self to have.
- Re: (Score:2)
  
  by dinfinity ( 2300094 ) writes:
  
  Listen to the samples in the linked OpenAI blog post. It does indeed preserve the native accent (which makes it sound like an American trying to speak a foreign language).
  - Re: (Score:2)
    
    by zooblethorpe ( 686757 ) writes:
    
    Oofda. Preserving an American accent when rendering in other languages seems ... an unfortunate choice.
    I also noticed a goof in the Japanese audio, where the final clause became "yuujou no kizu o iwaimashou" — "let's celebrate the wounds / scars of friendship", instead of kizuna, "connections / bonds / ties". The text spells it out correctly, but the audio is missing that na on the end of kizuna. That's not an issue of accent, that's just a plain old vocabulary goof. Interesting.
    - Re: (Score:2)
      
      by dinfinity ( 2300094 ) writes:
      
      I think there is some tweaking they could do one way or the other. In the French pronunciation the American accent only very rarely comes through with most Rs being rolled properly (quite pronounced even for a native speaker, I'd say). The German version has far more of the accent.
      I would imagine that the training (result) can be skewed towards the correct pronunciation if desired (or even stronger towards the original speaker's accent, although it is pretty hard to imagine many valid use cases for that).
"Caution"? Nice joke! We all laughed... (Score:2)

by gweihir ( 88907 ) writes:

Obviously, they will just be concerned that something f the crap they unleash upon the world could splatter back onto them. Apart from that, greed is good! They will obviously monetize this to the maximum degree possible, like they have done with all their products so far.
Sweet! (Score:2)

by azcoyote ( 1101073 ) writes:

Now we can have more videos like this: https://m.youtube.com/watch?v=... [youtube.com]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

OpenAI Reveals AI Tool To Recreate Human Voices (axios.com) 24

OpenAI Reveals AI Tool To Recreate Human Voices More Login

OpenAI Reveals AI Tool To Recreate Human Voices

Late to the party (Score:4, Insightful)

... and down to clown. (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

YouTubers already using something similar (Score:2)

In general, I'm a fan of AI (Score:4, Insightful)

Re: (Score:2)

In other news, knife companies keep making knives (Score:3)

Re: (Score:2, Informative)

That makes no sense at all. (Score:2)

"Native accent"?? (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

"Caution"? Nice joke! We all laughed... (Score:2)

Sweet! (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot