Our Brains React Differently to Deepfake Voices, Researchers Find (news.uzh.ch) 14
"University of Zurich researchers have discovered that our brains process natural human voices and "deepfake" voices differently," writes Slashdot reader jenningsthecat.
From the University's announcement: The researchers first used psychoacoustical methods to test how well human voice identity is preserved in deepfake voices. To do this, they recorded the voices of four male speakers and then used a conversion algorithm to generate deepfake voices. In the main experiment, 25 participants listened to multiple voices and were asked to decide whether or not the identities of two voices were the same. Participants either had to match the identity of two natural voices, or of one natural and one deepfake voice.
The deepfakes were correctly identified in two thirds of cases. "This illustrates that current deepfake voices might not perfectly mimic an identity, but do have the potential to deceive people," says Claudia Roswandowitz, first author and a postdoc at the Department of Computational Linguistics.
The researchers then used imaging techniques to examine which brain regions responded differently to deepfake voices compared to natural voices. They successfully identified two regions that were able to recognize the fake voices: the nucleus accumbens and the auditory cortex. "The nucleus accumbens is a crucial part of the brain's reward system. It was less active when participants were tasked with matching the identity between deepfakes and natural voices," says Claudia Roswandowitz. In contrast, the nucleus accumbens showed much more activity when it came to comparing two natural voices.
The complete paper appears in Nature.
From the University's announcement: The researchers first used psychoacoustical methods to test how well human voice identity is preserved in deepfake voices. To do this, they recorded the voices of four male speakers and then used a conversion algorithm to generate deepfake voices. In the main experiment, 25 participants listened to multiple voices and were asked to decide whether or not the identities of two voices were the same. Participants either had to match the identity of two natural voices, or of one natural and one deepfake voice.
The deepfakes were correctly identified in two thirds of cases. "This illustrates that current deepfake voices might not perfectly mimic an identity, but do have the potential to deceive people," says Claudia Roswandowitz, first author and a postdoc at the Department of Computational Linguistics.
The researchers then used imaging techniques to examine which brain regions responded differently to deepfake voices compared to natural voices. They successfully identified two regions that were able to recognize the fake voices: the nucleus accumbens and the auditory cortex. "The nucleus accumbens is a crucial part of the brain's reward system. It was less active when participants were tasked with matching the identity between deepfakes and natural voices," says Claudia Roswandowitz. In contrast, the nucleus accumbens showed much more activity when it came to comparing two natural voices.
The complete paper appears in Nature.
Because (Score:3)
Because people sound like people but deepfakes sound like Scarlett Johansson for some reason?
Maybe one day science will explain it.
Breathing (Score:2)
When we speak, we expel air. While we generally don't notice it (except in the case of Mark Wahlberg), that extra tidbit of information is something we've grown accustomed to over the centuries. It's embedded as part of us without us realizing it. So far, deepake voices are not able to successfully replicate that act. We may not be able to explain why the voice doesn't sound right, we just know it does.
Re: (Score:2)
Chatbots are incredibly ineffective. (Score:2)
The only real world use-case I can see where LLM's would become genuinely productive would be spam, extortion and generating fake content. At best, these algorithmically generated voices have approached the uncanny valley but they have never been able to cross it. If you watch any online call-in show you
Re:Chatbots are incredibly ineffective. (Score:4, Insightful)
The world of digital audio and synthesizers was at the imitation stage in the 1980's.. then over a period of time improved incrementally. Now, many acoustic instruments are reproduced to a level that is *nearly* indistinguishable from the actual acoustic instruments. I say *nearly* because, perhaps expert musicians could spot small problems with the sound, but in practical use, the "average person" couldn't distinguish.
Today's electronic pianos, one of the more difficult instruments to get right, are mostly indistinguishable from acoustic. Direct A/B comparison might reveal the differences, but in a recording, you wouldn't know the difference...
I doubt things like stammering or other human foilbles won't be matched
Re: (Score:2)
A few thoughts (Score:3)
2) The headline claims this is a study of deepfakes. But it's a study of a particular set of models on particular speakers using a particular version of some deepfake software applied by some particular modelers. It might or might not be the best model that can currently be made today, and there's likely to be a new version tomorrow. It's like doing one car review and drawing the conclusion that "cars have a bumpy ride."
3) they tested the deepfake against the recording it was created from back-to-back (A/B). That's the most difficult setup. If somebody is deepfaking your supposedly-kidnapped granddaughter [theguardian.com] crying for help over the phone, you won't have the luxury of hearing it back to back with live speech (or screaming which you've probably never heard in real life from that person). In fact they'll probably mask any deficiencies with bad audio quality, short clips, etc.
2016 deepfake voices (Score:3)
They're using ancient technology.
"To synthesize deepfake voices, we used the open-source voice conversion (VC) software SPROCKET16, which revealed the second-best sound quality scores for same-speaker pairs and the sixth-best quality for speaker similarity rating among 23 conversion systems submitted to the VC challenge in 2018"
Bullshit detectors evolved a long time ago. (Score:2)
Re: (Score:2)
Re: (Score:2)
Deep Fake Halitosis is a feature. (Score:2)
The Deep Fake Flatulence is a bug (stink)