AI Researchers Analyze Similarities of Scarlett Johanssson's Voice to OpenAI's 'Sky' (npr.org) 87
AI models can evaluate how similar voices are to each other. So NPR asked forensic voice experts at Arizona State University to compare the voice and speech patterns of OpenAI's "Sky" to Scarlett Johansson's...
The researchers measured Sky, based on audio from demos OpenAI delivered last week, against the voices of around 600 professional actresses. They found that Johansson's voice is more similar to Sky than 98% of the other actresses.
Yet she wasn't always the top hit in the multiple AI models that scanned the Sky voice. The researchers found that Sky was also reminiscent of other Hollywood stars, including Anne Hathaway and Keri Russell. The analysis of Sky often rated Hathaway and Russell as being even more similar to the AI than Johansson.
The lab study shows that the voices of Sky and Johansson have undeniable commonalities — something many listeners believed, and that now can be supported by statistical evidence, according to Arizona State University computer scientist Visar Berisha, who led the voice analysis in the school's College of Health Solutions and the College of Engineering. "Our analysis shows that the two voices are similar but likely not identical," Berisha said...
OpenAI maintains that Sky was not created with Johansson in mind, saying it was never meant to mimic the famous actress. "It's not her voice. It's not supposed to be. I'm sorry for the confusion. Clearly you think it is," Altman said at a conference this week. He said whether one voice is really similar to another will always be the subject of debate.
Yet she wasn't always the top hit in the multiple AI models that scanned the Sky voice. The researchers found that Sky was also reminiscent of other Hollywood stars, including Anne Hathaway and Keri Russell. The analysis of Sky often rated Hathaway and Russell as being even more similar to the AI than Johansson.
The lab study shows that the voices of Sky and Johansson have undeniable commonalities — something many listeners believed, and that now can be supported by statistical evidence, according to Arizona State University computer scientist Visar Berisha, who led the voice analysis in the school's College of Health Solutions and the College of Engineering. "Our analysis shows that the two voices are similar but likely not identical," Berisha said...
OpenAI maintains that Sky was not created with Johansson in mind, saying it was never meant to mimic the famous actress. "It's not her voice. It's not supposed to be. I'm sorry for the confusion. Clearly you think it is," Altman said at a conference this week. He said whether one voice is really similar to another will always be the subject of debate.
forgot to mention... (Score:3, Insightful)
that they tried to get her to license that voice to them.
THEY thought it was her.
Re:forgot to mention... (Score:5, Insightful)
TWICE. Before "Sky" they approached Johansson twice, trying to license her voice.
Whatever they claim, it's a bad look.
Re: (Score:2)
Maybe they like her voice. Other people do.
Many use/like speech style on the verge of flirty (Score:4, Interesting)
Maybe they like her voice. Other people do.
Its not the voice so much, its the style of speaking. The style sounds like a college aged women who is on the verge of becoming flirty. Many, many, women have on occasion adopted that style because people like it, they recognize and understand it.
Re: (Score:2)
Re: (Score:2)
Re:Many use/like speech style on the verge of flir (Score:4, Interesting)
No one said that it was on purpose. And yes, that's how it usually works. When acting in a certain way gets you what you want, you tend to do that more.
Sure. There's the talking to babies voice, the romantic voice, lots of different voices.
And not just women. I have my normal voice, my angry voice, which instead of yelling, is low and monotone. People tell me it is pretty scary. And then there is my "obey now" voice, which is apparently shocking, because if you are used to my normal soft spoken voice, it freaks you out. That's reserved for emergencies.
Re: (Score:2)
Re: (Score:2)
No this is not being fake - it is learning, adapting to the audience, and using effective communication. If I tell people I need to use the loo or baño and they are often confused, but saying bathroom is clearly understood, then I will use bathroom more often as it communicates effectively. The same is true of tone of voice, facial expression, and hand gestures.
Re: (Score:2)
Well that's what we call "fake people".
Absolutely not. Vocal communication is more than just words. Emotions are conveyed via style. Hence emoticons and emojis invented for textual conversations. ;-)
Re: (Score:2)
Re: (Score:2)
Yet people manage to communicate on Slashdot just fine without emojis. You can't even hear the inflection in my voice yet we are communicating just fine.
We do have emoticons. Consider this example I used in a different response:
;-)."
"If a person is friendly or angry you can often tell from their speech regardless of language. Except German, that always sounds like an angry person.
Re: (Score:2)
Re: (Score:1)
"Adopted that style"? Seriously? People change their voice on purpose for some perceived benefit?
Sure. Ever hear women adopting "vocal fry"? Drop their natural voice an octave and make it sound gravelly. Someone told them it makes them sound smart or man-like. Mostly it makes them sound stupid.
Or there is the case of Elizabeth Holmes of Theranos infamy. She didn't use vocal fry, but dropped her voice. I guess she thought it made he sound smart or something. Which is strange - they have recordings of her speaking in her natural voice, and her real voice is nice, a lot nicer than the weird affected v
Re: (Score:2)
Its how we convey emotions in speech (Score:2)
Sure, some people are fake people or even sociopaths I guess. But it's more of a character flaw than a common occurrence. Maybe it has been more normalised int he US. I find people who live in big cities tend to be more concerned about their image or getting things from people.
Not really. Its how humans convey emotions in speech. Speech is multi channel, there are the words that have a specific meaning, but sometimes the words do not convey the emotional content. That is where tone of voice and style of speech come into play. That emotional channel might even be universal. If a person is friendly or angry you can often tell from their speech regardless of language. Except German, that always sound like an angry person. ;-). == See the emoticon? What does it do, it communicates
Re: (Score:2)
Seriously? People change their voice on purpose for some perceived benefit?
Yes. They are called "actors".
Re: Many use/like speech style on the verge of fli (Score:2)
To me it sounds quite repulsive, very valley-girl-like.
Re:forgot to mention... (Score:5, Insightful)
OpenAI asserts now that "It's not her voice. It's not supposed to be", yet when the voice dropped CEO Altman Xittered "Her". Because that is exactly what a CEO does to show that the voice is not supposed to be confused with the actress voicing Her.
Re: (Score:2)
Nonsense. Xitter is a fine portmanteau. Note that Chinese pronunciation is used for the "X".
Re: (Score:3)
Especially the second time, because it was right before they released it. It's clear they knew they were infringing and were making a last-ditch legal CYA attempt.
Re: (Score:2)
Especially the second time, because it was right before they released it. It's clear they knew they were infringing and were making a last-ditch legal CYA attempt.
At best it shows it's clear they felt it was likely they would be accused of infringing, so they were making a last-ditch legal CYA attempt.
Re:forgot to mention... (Score:5, Informative)
OpenAI started casting for voice actors in May 2023. They hired the actress in June 2023. The actress who recorded Sky said they asked her to use her natural voice, never referenced the movie Her, and that no one ever told her she sounded like Scarlett Johansson.
In September 2023, they both released Sky and asked Scarlett Johansson if she wanted to record her voice. She said no. Skip ahead to May 2024 and the release of 4o, and Altman asks Scarlett again. Sky was already in use for more than half a year when the hype of 4o made everyone focus on the voice.
When Scarlett complained, they took down the voice.
https://www.washingtonpost.com... [washingtonpost.com]
Re: (Score:2)
Pretty ridiculous, if you ask me. I'm not sure how anyone can say they sound 'alike', there's an entirely different color to the two voices - pitch, tone and inflection.
Re: (Score:2)
Re: (Score:2)
If that's the case then in the general populace, where everyone doesn't sound the same, then this voice would presumably match Johansson's at a rate greater than 98%.
Re: (Score:3)
Re: (Score:2)
from another angle . . . this standard means that 1 in 50 actresses *cannot* record their voices for such purposes so as to protect one actress's monopoly.
Add a few more folks with such exalted protection, and, well . . .
Re: (Score:2)
Re: (Score:2)
for her to have that right over a resemblance held by 2%, is taking that right away from the 2%.
Also, if she *does* have it, should she in turn be barred in favor of the earlier actresses with the same characteristics? Or be forced to pay them?
I really don't see a workable rule that gives this one particular actress control over all other voices, past and present, with similar attributes.
Hmm, are the other actresses with similar voices also barred from voice work like animation, in case they get mistaken?
Re: (Score:2)
oh, and I suppose that I should mention that it appears that one of those other actresses *did* do exactly that, in the massive casting call which predates the release of the movie that is driving attention to this.
Re: (Score:2)
Re: (Score:2)
>For example, we don't have the full facts about how many actors
>were auditioned by OpenAI prior to choosing one or more voice
>providers.
Re: (Score:2)
Re:forgot to mention... (Score:5, Insightful)
SJ was dumb to not milk this trend IMO. The Rodenberry kids should license Majel's voice likeness to some competitor.
Maybe she doesn't want her voice associated with software that tells people to attach pizza cheese with glue or eat rocks.
Re: (Score:3)
SJ was dumb to not milk this trend IMO. The Rodenberry kids should license Majel's voice likeness to some competitor.
Maybe she doesn't want her voice associated with software that tells people to attach pizza cheese with glue or eat rocks.
LOL, or maybe she just wants everyone involved to simply stop bullshitting. Vehemently claims it’s not her voice after they tried to literally license her voice? Go ahead OpenAI. Tell the world why you tried to license it without admitting the obvious. This should be more entertaining than any goofy ChatGPT response. Dammit where’s my popcorn app..
Re: (Score:3)
She declined so they hired another middle-age female voice actor with similar acoustic properties.
It you got the chronology wrong: she declined in September 2023 (and again in May 2024) but they hired another actor well before that, in June 2023.
Sounds like two different women, just both bubbly (Score:2)
that they tried to get her to license that voice to them. THEY thought it was her.
No, they may have simply wanted Johansson involved for the PR it would add. However Johansson was probably smart enough to stay away from this due to the inevitable bad press that will come from stupid or embarrassing things the voice will ultimately say.
... a style commonly used at times by young college aged women. In "Her" Johansson did not invent a conversational style, she adopted
To me the two voices sound like different women. What is similar is their conversational style, young, friendly, bubbly
Re:forgot to mention... (Score:5, Insightful)
Scenario: I asked for a Coke but the restaurant didn't have any. So I asked for a Pepsi and they brought it to me.
Question: Who should the Coca Cola Company sue over this: Pepsi, the restaurant, or me?
Re: (Score:2)
Meh, Stuart.
It is either Douglas Rain or Peter Tuddenham. Or both.
Re: (Score:2)
I'd rather have Patrick Stewart to read to me.
Everyone has a different preference with respect to who they want to talk flirtatiously to them. DEI based options should apply to the voice. :-)
Re: (Score:2)
Well, some may like "flirtatious", others may prefer a more, errrr... harmoniously developed character.
https://youtube.com/watch?v=Ll... [youtube.com]
Voice vs books and articles (Score:2)
How is it there is widespread general agreement that if her voice was stolen then the AI guys are bad but when it is the written word created the human mind, that is some how ok with a lot of people?
Re: (Score:2)
Re: (Score:2)
> I think there is widespread, though not universal, agreement that the AI just copying a particular creative work or particular voice without permission is problematic
Not on /.
> And also widespread, though not universal, agreement that drawing on a vast database of creative works or voices to produce something based on trends amongst the dataset isn't
What is the difference between copying 1 book and a library? Why does copying many works make it ok?
Re: (Score:2)
What is the difference between copying 1 book and a library? Why does copying many works make it ok?
It isn't the volume of works copies that makes it fair use, it is what you do with those copies. Making copies of anything is perfectly legal. I can make a recording of any movie or replicate any painting I want. But what I do with those copies then determines whether I broke any copyright laws.
If you copy every book in a library, and then open your own library with those books, you have broken the law. But if you copy every book in the library so you can analyze how often certain words are used in books an
Re: (Score:2)
Ok, I'll buy that. I need to think more about what you said but ok.
It's a much better and well thought answer than the usual:
1) it isn't a copy
2) it's for the public good
3) copyright is dumb
4) authors are greedy
5) I'm a corporate shill
6) AI is so important it overrides established law
7) everyone else's work should be free because I've never produced anything worth taking
Thank you.
Re: (Score:2)
I've had some time to think about this further.
You've changed my position about 99% with one edge case exception (see below).
As long as the AI can not be forced through any means to reproduce lengthy (as determined by copyright standards) sections of text, I'm on board with your take on this. We have seen AI that will reproduce long text sections or unique PII. That's super not ok and definitely all sorts of legal violations in my book.
If it returns a link to the full text on some third party free library
How do you stop legal challenges for coincidence? (Score:3)
AIs will sound and look(eventually) like someone that currently exists or existed. How do you stop legal challenges when it's just a coincidence?
You document your intent during design .... (Score:3)
AIs will sound and look(eventually) like someone that currently exists or existed. How do you stop legal challenges when it's just a coincidence?
You document the design and development of the AI voice in real time to show the intent. For example if all the documentation indicates that they want a "young, friendly, bubbly, female voice" then they will likely be safe. A style that is taken from real life, used at times by many college aged women. I think people are confusing the preceding speaking style with whether the two voices sound like the same woman.
Re: (Score:2)
How do you stop legal challenges when it's just a coincidence?
You start by not alluding that one is supposed to sound like the other. Saying, "Her" in a Tweet was a really, really bad idea.
How do you claim coincidence. (Score:2)
AIs will sound and look(eventually) like someone that currently exists or existed. How do you stop legal challenges when it's just a coincidence?
Perhaps the more relevant question is how do you claim coincidence when a specific company tried to previously reach out to this exact human, in order to try and license the exact thing they now claim isn’t in use by same specific company?
Coincidence is going to need a bookie named Mission Impossible to survive those odds.
Re: (Score:2)
the way you do it, as done in a court of law, is to ask "would a reasonable person think this to be SJ's voice?". The answer looks to be a pretty clear "yes" - I'll bet if you asked 100 random people in the street you'd end up with 80+ people thinking it was her.
I hope that isn't all it takes to show you were intentionally misleading people. When my wife and I were first watching "Ready or Not", we both thought we were watching Margot Robbie. If you found 100 random people who don't know who Samara Weaving is and ask them to identify the actress, I'd bet the number who say she's Margot Robbie would be similar to the number of people who listen to the OpenAI Sky voice and think it is Scarlett Johansson.
Re: (Score:2)
The judge says "not more of this stupid shit" and dismisses the case. Eventually people quit paying lawyers to make judges laugh at them.
There are already lots of models around that you can train to mimic a voice with just a bit of source material. I wouldn't be surprised if the future is some generic canned voices plus the option to train your own. Want something that sounds like Scarlett Johansen in Her? Just hit the button and let the model listen for a minute and there you go.
Is analysis including the conversational style? (Score:2)
Johansson did not invent a new conversational style in "Her". She adopted a well known style that many men would recognize from "back in the day" with fond memories. If Johansson's voice is similar that's one thing, however if its just
Re: (Score:1)
Instead of SJ, get SAL (Score:2)
Candice Bergen might like a little work.
SAL9000 "I would like to ask a question."
Dr. Chandra "What is it?"
SAL9000 "Will I dream?"
Dr. Chandra "Of course you will dream. All intelligent creatures dream. Nobody knows why. Perhaps you will dream of HAL, just as I often do."
https://www.youtube.com/watch?v=T2E7sxGAmuo
Re: (Score:2)
... or any of the voice actresses who did the Ship's Computer voice for Star Trek, or Bret Spiner, or any number of other notable "computer" voices we've had in media.
Personally, I think Bret Spiner as Data would be by far the best voice to use, inflections and all. Though perhaps it's best to stick to a female voice to temper anyone's trust in the accuracy of the data.
So? (Score:3)
They hired a voice actress and based the AI voice off her. Are they going to sue that actress for doing an impression of Scarlett Johansson?
Re: (Score:2)
Doubt they did that either. Most likely it's a combination of pilfered voices. Same as the rest of OpenAI business model.
Re: (Score:2)
So they tried to hire SJ, she said no, and they decided not to hire another actress but instead steal a combination of other people's voices to try and sound like SJ?
Re: (Score:2)
But it doesn't really sound like her does it. There is similarities, sure. Just like what you'd get if it was a mixture.
And wanting her blessing is not the same as needing her to perform. There's no shortage of source material in the open. Albeit pillaged as it stands. No doubt if SJ had done the deal then OpenAI would've been trumpeting it to press other performers to also sign up.
NPR helping out OpenAI? (Score:2)
In this case, NPR hired a experts who found two other actresses whose work could plausibly be captured to disguise the contributions from the so-far un-named "different professional actress using her own natural speaking voice," claimed by OpenAI to provide voice samples to develop the "Sky" voice: Keri Russel and Anne Hathaway. Blending these three voices could produce something distinct but similar to all three-or-four (depending on whether the "Sky" voice is or isn't based on Scarlett Johansson) of these
R.C. Brayâ(TM)s Skippy from ExFor (Score:2)
Does not matter (Score:3)
It does not matter how the voice was created. Just that it appears like Scarlett Johansson to a human and that Altman had tweeted "her" should be enough.
That people think it was her infringes on her legal personality rights [wikipedia.org] as a celebrity
Re:Does not matter (Score:4, Insightful)
Re: (Score:2)
Easy fix (Score:2)
They should just let the voice talk in a Christopher Walken rhythm.
Re: (Score:3)
And ask for more cowbell?
A probability matching score would help (Score:2)
What's missing from either the article or the statistical analysis, IMO, is a probability matching score. I would define this two ways: 1) The likelihood of a match of an AI voice to an actress' voice in the set of all voices. 2) The likelihood that the AI voice matched a randomly generated voice.
The first could be defined by using feature extraction, using either a CNN or linear discriminant analysis, on each segment of the voice smaple. That is, the voice sample would be divided into segments. An algori
Lawyers will likely find in discovery (Score:2)
They can subpoena all the emails and documents from the company and likely they are going to show a plan to mimic Her.
similar how (Score:2)
"AI models can evaluate how similar voices are to each other"
Really? Similar in what metric? Is there psychophysical data to back up that claim?
"Her" movie playing out in real life (Score:2)
The Most Important Question (Score:2)
Re: (Score:2)
*purportedly* My bet would be subpoenaed memos will establish it *was* Scarlett HERself after all, from clips from HER movie.