Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI

Researchers Use Fluid Dynamics To Spot Artificial Imposter Voices (theconversation.com) 23

An anonymous reader quotes a report from The Conversation: To detect audio deepfakes, we and our research colleagues at the University of Florida have developed a technique that measures the acoustic and fluid dynamic differences between voice samples created organically by human speakers and those generated synthetically by computers.

The first step in differentiating speech produced by humans from speech generated by deepfakes is understanding how to acoustically model the vocal tract. Luckily scientists have techniques to estimate what someone -- or some being such as a dinosaur -- would sound like based on anatomical measurements of its vocal tract. We did the reverse. By inverting many of these same techniques, we were able to extract an approximation of a speaker's vocal tract during a segment of speech. This allowed us to effectively peer into the anatomy of the speaker who created the audio sample.

From here, we hypothesized that deepfake audio samples would fail to be constrained by the same anatomical limitations humans have. In other words, the analysis of deepfaked audio samples simulated vocal tract shapes that do not exist in people. Our testing results not only confirmed our hypothesis but revealed something interesting. When extracting vocal tract estimations from deepfake audio, we found that the estimations were often comically incorrect. For instance, it was common for deepfake audio to result in vocal tracts with the same relative diameter and consistency as a drinking straw, in contrast to human vocal tracts, which are much wider and more variable in shape. This realization demonstrates that deepfake audio, even when convincing to human listeners, is far from indistinguishable from human-generated speech. By estimating the anatomy responsible for creating the observed speech, it's possible to identify the whether the audio was generated by a person or a computer.

This discussion has been archived. No new comments can be posted.

Researchers Use Fluid Dynamics To Spot Artificial Imposter Voices

Comments Filter:
  • by timeOday ( 582209 ) on Saturday October 01, 2022 @10:30AM (#62929011)
    I would imagine a lot of the deepfake detection research going on around the world is unpublished since publishing the detector is a roadmap to defeating it. Ultimately a recorded sound is a sequence of bytes and there's no theoretical reason it couldn't be faked perfectly.
    • by malvcr ( 2932649 )

      I was thinking the same ...

      ... when you find a method, you also find the anti-method. And computers can simulate whatever if we have enough knowledge and time to figure how to do that.

    • by NagrothAgain ( 4130865 ) on Saturday October 01, 2022 @11:17AM (#62929081)
      Exactly. The article could just have easily been titled "researchers develop method to generate even more realistic audio impersonations."
    • This is why I refuse to authenticate with banks using my voice print. There will come a day where simply talking in public becomes a private key breach and your bank account gets hacked.
    • by Joviex ( 976416 )

      I would imagine a lot of the deepfake detection research going on around the world is unpublished since publishing the detector is a roadmap to defeating it. Ultimately a recorded sound is a sequence of bytes and there's no theoretical reason it couldn't be faked perfectly.

      A recorded voice has organic vocalizations that computers cant emulate. Regardless how good you think you made your simulation, short of cutting out or growing custom, organic vocal cords and pumping the output through those, you are not going to get the subtle oscillations of organic resonances.

      • Re: Arms race (Score:5, Informative)

        by ShooterNeo ( 555040 ) on Saturday October 01, 2022 @12:39PM (#62929261)

        Bruh do you even ML. In short, no. You don't need to make meat voiceboxes to train a neural network to make the correct sounds. You literally just need what this paper has - a detector - and then you use it's signal during training. The neural network will figure it out somehow. (how it does it "depends")

      • by davidwr ( 791652 )

        Deepfakes don't have to be perfect, they just have to be good enough to "pass" as real.

        But, let's suppose you are right, and the only way to "pass" is to use real or lab-grown vocal cords. If that's what it takes, you can be pretty sure that CIA-type organizations around the world and deep-pocked private companies are already trying to do exactly this.

      • Man no. Just no. We have yet to find an audio source we can't model.

        Audio is just sine waves my dude, lots of sine waves. And we've known how to do those since Ptolemys table of chords, and we've known how to work out what those sine waves are since Fourier in the 1700s.

        Ultimately you really just need the formant frequency and the overtones, and you'll derive the parameters of your artificial voiebox from that.

    • Just insert the detection algorithm into the GAN chain that is used to calculate the deepfake, and the problem is - unfortunately - solved. There is currently no way, AFAIK, to stop this.
      • Not necessarily. If the AI doesnt have enough parameters to tweak it may solve one issue but cause another in the process.

    • Ultimately a recorded sound is a sequence of bytes and there's no theoretical reason it couldn't be faked perfectly.

      I'm not an audio engineer, but I imagine it would depend on the bit rate of the fake/digital sound and the capability of the analysis gear. Human speech, and natural sound, is analog and continuous...

    • by Kisai ( 213879 )

      Nah. 100% of deepfakes give away themselves:
      1. They lack emotional depth. SOTA (State of the Art) stuff still can not emote, express sarcasm, laugh, or scream on cue. It can only do these in post-processing. If you were to engage in real time, the deepfake will fail every time.
      2. Most deepfake audio is done deliberately with poor quality audio. This is because the voice systems used are often only sampled at 16khz, not the 48khz or 96khz that would be necessary to fool the human ear AND digital forensics. T

  • What is an "Artificial Imposter"? Is it a real person or a robot that's pretending to be an imposter?

    • by hey! ( 33014 )

      It's a simulation of a human rather than an actual human performing the imposture.

  • by BytePusher ( 209961 ) on Saturday October 01, 2022 @12:14PM (#62929181) Homepage
    Just another filter to use for adversarial training.
  • All you need to do is set up a loop with the "deepfake detection" algorithm against deepfake audio creation and in a few weeks, you won't be able to detect a difference any more.

A morsel of genuine history is a thing so rare as to be always valuable. -- Thomas Jefferson

Working...