Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI

AI Researchers Analyze Similarities of Scarlett Johanssson's Voice to OpenAI's 'Sky' (npr.org) 87

AI models can evaluate how similar voices are to each other. So NPR asked forensic voice experts at Arizona State University to compare the voice and speech patterns of OpenAI's "Sky" to Scarlett Johansson's... The researchers measured Sky, based on audio from demos OpenAI delivered last week, against the voices of around 600 professional actresses. They found that Johansson's voice is more similar to Sky than 98% of the other actresses.

Yet she wasn't always the top hit in the multiple AI models that scanned the Sky voice. The researchers found that Sky was also reminiscent of other Hollywood stars, including Anne Hathaway and Keri Russell. The analysis of Sky often rated Hathaway and Russell as being even more similar to the AI than Johansson.

The lab study shows that the voices of Sky and Johansson have undeniable commonalities — something many listeners believed, and that now can be supported by statistical evidence, according to Arizona State University computer scientist Visar Berisha, who led the voice analysis in the school's College of Health Solutions and the College of Engineering. "Our analysis shows that the two voices are similar but likely not identical," Berisha said...

OpenAI maintains that Sky was not created with Johansson in mind, saying it was never meant to mimic the famous actress. "It's not her voice. It's not supposed to be. I'm sorry for the confusion. Clearly you think it is," Altman said at a conference this week. He said whether one voice is really similar to another will always be the subject of debate.

This discussion has been archived. No new comments can be posted.

AI Researchers Analyze Similarities of Scarlett Johanssson's Voice to OpenAI's 'Sky'

Comments Filter:
  • by XaXXon ( 202882 ) <xaxxon@gmaGIRAFFEil.com minus herbivore> on Sunday June 02, 2024 @08:39PM (#64518743) Homepage

    that they tried to get her to license that voice to them.

    THEY thought it was her.

    • by 93 Escort Wagon ( 326346 ) on Sunday June 02, 2024 @08:52PM (#64518765)

      TWICE. Before "Sky" they approached Johansson twice, trying to license her voice.

      Whatever they claim, it's a bad look.

      • Maybe they like her voice. Other people do.

        • by drnb ( 2434720 ) on Sunday June 02, 2024 @10:31PM (#64518909)

          Maybe they like her voice. Other people do.

          Its not the voice so much, its the style of speaking. The style sounds like a college aged women who is on the verge of becoming flirty. Many, many, women have on occasion adopted that style because people like it, they recognize and understand it.

          • "Adopted that style"? Seriously? People change their voice on purpose for some perceived benefit?
            • No one said that it was on purpose. And yes, that's how it usually works. When acting in a certain way gets you what you want, you tend to do that more.
              • by Ol Olsoc ( 1175323 ) on Monday June 03, 2024 @06:57AM (#64519421)

                No one said that it was on purpose. And yes, that's how it usually works. When acting in a certain way gets you what you want, you tend to do that more.

                Sure. There's the talking to babies voice, the romantic voice, lots of different voices.

                And not just women. I have my normal voice, my angry voice, which instead of yelling, is low and monotone. People tell me it is pretty scary. And then there is my "obey now" voice, which is apparently shocking, because if you are used to my normal soft spoken voice, it freaks you out. That's reserved for emergencies.

              • Well that's what we call "fake people".
                • by Alascom ( 95042 )

                  No this is not being fake - it is learning, adapting to the audience, and using effective communication. If I tell people I need to use the loo or baño and they are often confused, but saying bathroom is clearly understood, then I will use bathroom more often as it communicates effectively. The same is true of tone of voice, facial expression, and hand gestures.

                • by drnb ( 2434720 )

                  Well that's what we call "fake people".

                  Absolutely not. Vocal communication is more than just words. Emotions are conveyed via style. Hence emoticons and emojis invented for textual conversations. ;-)

                  • Yet people manage to communicate on Slashdot just fine without emojis. You can't even hear the inflection in my voice yet we are communicating just fine.
                    • by drnb ( 2434720 )

                      Yet people manage to communicate on Slashdot just fine without emojis. You can't even hear the inflection in my voice yet we are communicating just fine.

                      We do have emoticons. Consider this example I used in a different response:

                      "If a person is friendly or angry you can often tell from their speech regardless of language. Except German, that always sounds like an angry person. ;-)."

                    • And every time I see that I think that's it's nice that Slashdot still has a following in their teens.
            • "Adopted that style"? Seriously? People change their voice on purpose for some perceived benefit?

              Sure. Ever hear women adopting "vocal fry"? Drop their natural voice an octave and make it sound gravelly. Someone told them it makes them sound smart or man-like. Mostly it makes them sound stupid.

              Or there is the case of Elizabeth Holmes of Theranos infamy. She didn't use vocal fry, but dropped her voice. I guess she thought it made he sound smart or something. Which is strange - they have recordings of her speaking in her natural voice, and her real voice is nice, a lot nicer than the weird affected v

              • Sure, some people are fake people or even sociopaths I guess. But it's more of a character flaw than a common occurrence. Maybe it has been more normalised int he US. I find people who live in big cities tend to be more concerned about their image or getting things from people.
                • Sure, some people are fake people or even sociopaths I guess. But it's more of a character flaw than a common occurrence. Maybe it has been more normalised int he US. I find people who live in big cities tend to be more concerned about their image or getting things from people.

                  Not really. Its how humans convey emotions in speech. Speech is multi channel, there are the words that have a specific meaning, but sometimes the words do not convey the emotional content. That is where tone of voice and style of speech come into play. That emotional channel might even be universal. If a person is friendly or angry you can often tell from their speech regardless of language. Except German, that always sound like an angry person. ;-). == See the emoticon? What does it do, it communicates

            • Seriously? People change their voice on purpose for some perceived benefit?

              Yes. They are called "actors".

          • To me it sounds quite repulsive, very valley-girl-like.

      • by careysub ( 976506 ) on Sunday June 02, 2024 @09:59PM (#64518865)

        OpenAI asserts now that "It's not her voice. It's not supposed to be", yet when the voice dropped CEO Altman Xittered "Her". Because that is exactly what a CEO does to show that the voice is not supposed to be confused with the actress voicing Her.

      • by SeaFox ( 739806 )

        Especially the second time, because it was right before they released it. It's clear they knew they were infringing and were making a last-ditch legal CYA attempt.

        • by ranton ( 36917 )

          Especially the second time, because it was right before they released it. It's clear they knew they were infringing and were making a last-ditch legal CYA attempt.

          At best it shows it's clear they felt it was likely they would be accused of infringing, so they were making a last-ditch legal CYA attempt.

      • by Col. Klink (retired) ( 11632 ) on Sunday June 02, 2024 @11:46PM (#64518973)

        OpenAI started casting for voice actors in May 2023. They hired the actress in June 2023. The actress who recorded Sky said they asked her to use her natural voice, never referenced the movie Her, and that no one ever told her she sounded like Scarlett Johansson.

        In September 2023, they both released Sky and asked Scarlett Johansson if she wanted to record her voice. She said no. Skip ahead to May 2024 and the release of 4o, and Altman asks Scarlett again. Sky was already in use for more than half a year when the hype of 4o made everyone focus on the voice.

        When Scarlett complained, they took down the voice.

        https://www.washingtonpost.com... [washingtonpost.com]

        • by CAIMLAS ( 41445 )

          Pretty ridiculous, if you ask me. I'm not sure how anyone can say they sound 'alike', there's an entirely different color to the two voices - pitch, tone and inflection.

          • They said "more alike than 98% other actresses". Given that most actresses are around her age and of similar.. body type, that's not really that 'alike'.
            • If most actresses share a lot in common with Johansson, including their voices, and yet despite that this voice still matches Johansson's more closely than 98% of other actresses, then that's an even stronger relation.

              If that's the case then in the general populace, where everyone doesn't sound the same, then this voice would presumably match Johansson's at a rate greater than 98%.
              • 98% is not very high. If they tested 100 actresses there are still two others that sound extremely similar to her. That alone demonstrates that her voice is not unique to her.
            • by hawk ( 1151 )

              from another angle . . . this standard means that 1 in 50 actresses *cannot* record their voices for such purposes so as to protect one actress's monopoly.

              Add a few more folks with such exalted protection, and, well . . .

              • So what if one of those other two actresses gives their voice to chtgpt? Does johanssen then have a right to sue?
                • by hawk ( 1151 )

                  for her to have that right over a resemblance held by 2%, is taking that right away from the 2%.

                  Also, if she *does* have it, should she in turn be barred in favor of the earlier actresses with the same characteristics? Or be forced to pay them?

                  I really don't see a workable rule that gives this one particular actress control over all other voices, past and present, with similar attributes.

                  Hmm, are the other actresses with similar voices also barred from voice work like animation, in case they get mistaken?

                  • by hawk ( 1151 )

                    oh, and I suppose that I should mention that it appears that one of those other actresses *did* do exactly that, in the massive casting call which predates the release of the movie that is driving attention to this.

        • It would be helpful to have the full facts investigated. For example, we don't have the full facts about how many actors were auditioned by OpenAI prior to choosing one or more voice providers. We know that the movie Her was not referenced during the hiring and recordings for this actress, and probably not during the hiring and recordings of any of the other actresses who also gave alternative readings before the final choices were made. We also don't know who later went over the complete list of recording
          • by hawk ( 1151 )

            >For example, we don't have the full facts about how many actors
            >were auditioned by OpenAI prior to choosing one or more voice
            >providers.

            On May 10, 2023, the casting agency and our casting directors issued a call for talent. In under a week, they received over 400 submissions from voice and screen actors. To audition, actors were given a script of ChatGPT responses and were asked to record them.

            . . .

            Through May 2023, the casting team independently reviewed and hand-selected an initial list of 14 ac

    • that they tried to get her to license that voice to them. THEY thought it was her.

      No, they may have simply wanted Johansson involved for the PR it would add. However Johansson was probably smart enough to stay away from this due to the inevitable bad press that will come from stupid or embarrassing things the voice will ultimately say.

      To me the two voices sound like different women. What is similar is their conversational style, young, friendly, bubbly ... a style commonly used at times by young college aged women. In "Her" Johansson did not invent a conversational style, she adopted

    • by Sloppy ( 14984 ) on Monday June 03, 2024 @10:11AM (#64519885) Homepage Journal

      Scenario: I asked for a Coke but the restaurant didn't have any. So I asked for a Pepsi and they brought it to me.

      Question: Who should the Coca Cola Company sue over this: Pepsi, the restaurant, or me?

  • How is it there is widespread general agreement that if her voice was stolen then the AI guys are bad but when it is the written word created the human mind, that is some how ok with a lot of people?

  • AIs will sound and look(eventually) like someone that currently exists or existed. How do you stop legal challenges when it's just a coincidence?

    • AIs will sound and look(eventually) like someone that currently exists or existed. How do you stop legal challenges when it's just a coincidence?

      You document the design and development of the AI voice in real time to show the intent. For example if all the documentation indicates that they want a "young, friendly, bubbly, female voice" then they will likely be safe. A style that is taken from real life, used at times by many college aged women. I think people are confusing the preceding speaking style with whether the two voices sound like the same woman.

    • How do you stop legal challenges when it's just a coincidence?

      You start by not alluding that one is supposed to sound like the other. Saying, "Her" in a Tweet was a really, really bad idea.

    • AIs will sound and look(eventually) like someone that currently exists or existed. How do you stop legal challenges when it's just a coincidence?

      Perhaps the more relevant question is how do you claim coincidence when a specific company tried to previously reach out to this exact human, in order to try and license the exact thing they now claim isn’t in use by same specific company?

      Coincidence is going to need a bookie named Mission Impossible to survive those odds.

    • by ceoyoyo ( 59147 )

      The judge says "not more of this stupid shit" and dismisses the case. Eventually people quit paying lawyers to make judges laugh at them.

      There are already lots of models around that you can train to mimic a voice with just a bit of source material. I wouldn't be surprised if the future is some generic canned voices plus the option to train your own. Want something that sounds like Scarlett Johansen in Her? Just hit the button and let the model listen for a minute and there you go.

  • Is analysis including the conversational style? That would seem to be a mistake. When I listen to both it sound like two different women, however both are using a similar young, friendly, bubbly style. One that sounds to me like many other real life women from college days.

    Johansson did not invent a new conversational style in "Her". She adopted a well known style that many men would recognize from "back in the day" with fond memories. If Johansson's voice is similar that's one thing, however if its just
    • by rogersc ( 622395 )
      Yes, similarity does not prove copying. Maybe the Her movie tried to copy a particular well-known voice style, and maybe OpenAI tried to mimic that same style.
  • Candice Bergen might like a little work.
    SAL9000 "I would like to ask a question."
    Dr. Chandra "What is it?"
    SAL9000 "Will I dream?"
    Dr. Chandra "Of course you will dream. All intelligent creatures dream. Nobody knows why. Perhaps you will dream of HAL, just as I often do."
    https://www.youtube.com/watch?v=T2E7sxGAmuo

    • by CAIMLAS ( 41445 )

      ... or any of the voice actresses who did the Ship's Computer voice for Star Trek, or Bret Spiner, or any number of other notable "computer" voices we've had in media.

      Personally, I think Bret Spiner as Data would be by far the best voice to use, inflections and all. Though perhaps it's best to stick to a female voice to temper anyone's trust in the accuracy of the data.

  • by systemd-anonymousd ( 6652324 ) on Sunday June 02, 2024 @11:03PM (#64518947)

    They hired a voice actress and based the AI voice off her. Are they going to sue that actress for doing an impression of Scarlett Johansson?

    • by evanh ( 627108 )

      Doubt they did that either. Most likely it's a combination of pilfered voices. Same as the rest of OpenAI business model.

      • So they tried to hire SJ, she said no, and they decided not to hire another actress but instead steal a combination of other people's voices to try and sound like SJ?

        • by evanh ( 627108 )

          But it doesn't really sound like her does it. There is similarities, sure. Just like what you'd get if it was a mixture.

          And wanting her blessing is not the same as needing her to perform. There's no shortage of source material in the open. Albeit pillaged as it stands. No doubt if SJ had done the deal then OpenAI would've been trumpeting it to press other performers to also sign up.

  • In this case, NPR hired a experts who found two other actresses whose work could plausibly be captured to disguise the contributions from the so-far un-named "different professional actress using her own natural speaking voice," claimed by OpenAI to provide voice samples to develop the "Sky" voice: Keri Russel and Anne Hathaway. Blending these three voices could produce something distinct but similar to all three-or-four (depending on whether the "Sky" voice is or isn't based on Scarlett Johansson) of these

  • The real tragedy here is that they didnâ(TM)t get R.C. Bray to do Skippy from the Expeditionary Force series. Even more fitting since most ML answers are a âoesolid schmaybeâ. (That you know of.) For those who havenâ(TM)t heard the audiobooks, Skippy is an absent minded, snarky AI who is an asshole. RC Bray patterned him off of Fraiser and he is awesome.
  • by Misagon ( 1135 ) on Monday June 03, 2024 @03:58AM (#64519157)

    It does not matter how the voice was created. Just that it appears like Scarlett Johansson to a human and that Altman had tweeted "her" should be enough.

    That people think it was her infringes on her legal personality rights [wikipedia.org] as a celebrity

    • Re:Does not matter (Score:4, Insightful)

      by Whateverthisis ( 7004192 ) on Monday June 03, 2024 @08:53AM (#64519653)
      this. What are these so-called "AI Researchers" thinking? This is a self-created problem by OpenAI, and it's about the impression that people got, most specifically Scarlett Johannsen herself. More data and analysis is not going to make this problem go away because it's not a data problem, it's a perceived lack of trust problem and more "analysis" in an attempt to deflect the issue is only going to make them look worse. take a page from the CDC on this; the CDC overall did a bang up job on tackling Covid, but they absolutely and completely messed up the public messaging space by acting like scientists. These researchers are fanning the flames of the problem, specifically by calling out new actresses as it being more similar.
    • Does it sound like her voice or her "Her" voice? If it sounds more like her character's voice, then she may not own the rights to it.
  • They should just let the voice talk in a Christopher Walken rhythm.

  • What's missing from either the article or the statistical analysis, IMO, is a probability matching score. I would define this two ways: 1) The likelihood of a match of an AI voice to an actress' voice in the set of all voices. 2) The likelihood that the AI voice matched a randomly generated voice.

    The first could be defined by using feature extraction, using either a CNN or linear discriminant analysis, on each segment of the voice smaple. That is, the voice sample would be divided into segments. An algori

  • They can subpoena all the emails and documents from the company and likely they are going to show a plan to mimic Her.

  • "AI models can evaluate how similar voices are to each other"

    Really? Similar in what metric? Is there psychophysical data to back up that claim?

  • We all knew the premise was going to happen eventually, I just didn't think her voice would be indirectly involved.
  • The most important question is how similar does it sound to the voice actress that OpenAI purportedly used to create the voice?
    • *purportedly* My bet would be subpoenaed memos will establish it *was* Scarlett HERself after all, from clips from HER movie.

To communicate is the beginning of understanding. -- AT&T

Working...