Researchers Use Fluid Dynamics To Spot Artificial Imposter Voices (theconversation.com) 23

Posted by BeauHD on Saturday October 01, 2022 @10:14AM from the outthinking-the-machines dept.

An anonymous reader quotes a report from The Conversation: To detect audio deepfakes, we and our research colleagues at the University of Florida have developed a technique that measures the acoustic and fluid dynamic differences between voice samples created organically by human speakers and those generated synthetically by computers.

The first step in differentiating speech produced by humans from speech generated by deepfakes is understanding how to acoustically model the vocal tract. Luckily scientists have techniques to estimate what someone -- or some being such as a dinosaur -- would sound like based on anatomical measurements of its vocal tract. We did the reverse. By inverting many of these same techniques, we were able to extract an approximation of a speaker's vocal tract during a segment of speech. This allowed us to effectively peer into the anatomy of the speaker who created the audio sample.

From here, we hypothesized that deepfake audio samples would fail to be constrained by the same anatomical limitations humans have. In other words, the analysis of deepfaked audio samples simulated vocal tract shapes that do not exist in people. Our testing results not only confirmed our hypothesis but revealed something interesting. When extracting vocal tract estimations from deepfake audio, we found that the estimations were often comically incorrect. For instance, it was common for deepfake audio to result in vocal tracts with the same relative diameter and consistency as a drinking straw, in contrast to human vocal tracts, which are much wider and more variable in shape. This realization demonstrates that deepfake audio, even when convincing to human listeners, is far from indistinguishable from human-generated speech. By estimating the anatomy responsible for creating the observed speech, it's possible to identify the whether the audio was generated by a person or a computer.

Researchers Use Fluid Dynamics To Spot Artificial Imposter Voices

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 23 Comments Log In/Create an Account

Comments Filter:

Arms race (Score:3)

by timeOday ( 582209 ) writes: on Saturday October 01, 2022 @10:30AM (#62929011)

I would imagine a lot of the deepfake detection research going on around the world is unpublished since publishing the detector is a roadmap to defeating it. Ultimately a recorded sound is a sequence of bytes and there's no theoretical reason it couldn't be faked perfectly.

- Re: (Score:1)
  
  by malvcr ( 2932649 ) writes:
  
  I was thinking the same ...
  ... when you find a method, you also find the anti-method. And computers can simulate whatever if we have enough knowledge and time to figure how to do that.
- Re: Arms race (Score:4)
  
  by NagrothAgain ( 4130865 ) writes: on Saturday October 01, 2022 @11:17AM (#62929081)
  
  Exactly. The article could just have easily been titled "researchers develop method to generate even more realistic audio impersonations."
  
- Re: Arms race (Score:3)
  
  by BytePusher ( 209961 ) writes:
  
  This is why I refuse to authenticate with banks using my voice print. There will come a day where simply talking in public becomes a private key breach and your bank account gets hacked.
  - Re: (Score:2)
    
    by timeOday ( 582209 ) writes:
    
    Remember this?
    https://youtu.be/-zVgWpVXb64?t... [youtu.be]
    - Re: (Score:2)
      
      by fahrbot-bot ( 874524 ) writes:
      
      Just re-watched it last month!
- Re: (Score:1)
  
  by Joviex ( 976416 ) writes:
  
  I would imagine a lot of the deepfake detection research going on around the world is unpublished since publishing the detector is a roadmap to defeating it. Ultimately a recorded sound is a sequence of bytes and there's no theoretical reason it couldn't be faked perfectly.
  A recorded voice has organic vocalizations that computers cant emulate. Regardless how good you think you made your simulation, short of cutting out or growing custom, organic vocal cords and pumping the output through those, you are not going to get the subtle oscillations of organic resonances.
  - Re: Arms race (Score:5, Informative)
    
    by ShooterNeo ( 555040 ) writes: on Saturday October 01, 2022 @12:39PM (#62929261)
    
    Bruh do you even ML. In short, no. You don't need to make meat voiceboxes to train a neural network to make the correct sounds. You literally just need what this paper has - a detector - and then you use it's signal during training. The neural network will figure it out somehow. (how it does it "depends")
    
    - Re: Arms race (Score:2)
      
      by Viol8 ( 599362 ) writes:
      
      How it does it at a low level is probably a mystery even to its programmers. ANN's are highly opaque.
  - Re: (Score:2)
    
    by davidwr ( 791652 ) writes:
    
    Deepfakes don't have to be perfect, they just have to be good enough to "pass" as real.
    But, let's suppose you are right, and the only way to "pass" is to use real or lab-grown vocal cords. If that's what it takes, you can be pretty sure that CIA-type organizations around the world and deep-pocked private companies are already trying to do exactly this.
  - Re: (Score:2)
    
    by sg_oneill ( 159032 ) writes:
    
    Man no. Just no. We have yet to find an audio source we can't model.
    Audio is just sine waves my dude, lots of sine waves. And we've known how to do those since Ptolemys table of chords, and we've known how to work out what those sine waves are since Fourier in the 1700s.
    Ultimately you really just need the formant frequency and the overtones, and you'll derive the parameters of your artificial voiebox from that.
- Re: (Score:2)
  
  by LordHighExecutioner ( 4245243 ) writes:
  
  Just insert the detection algorithm into the GAN chain that is used to calculate the deepfake, and the problem is - unfortunately - solved. There is currently no way, AFAIK, to stop this.
  - Re: Arms race (Score:2)
    
    by Viol8 ( 599362 ) writes:
    
    Not necessarily. If the AI doesnt have enough parameters to tweak it may solve one issue but cause another in the process.
- Re: (Score:2)
  
  by fahrbot-bot ( 874524 ) writes:
  
  Ultimately a recorded sound is a sequence of bytes and there's no theoretical reason it couldn't be faked perfectly.
  I'm not an audio engineer, but I imagine it would depend on the bit rate of the fake/digital sound and the capability of the analysis gear. Human speech, and natural sound, is analog and continuous...
- Re: (Score:2)
  
  by Kisai ( 213879 ) writes:
  
  Nah. 100% of deepfakes give away themselves:
  1. They lack emotional depth. SOTA (State of the Art) stuff still can not emote, express sarcasm, laugh, or scream on cue. It can only do these in post-processing. If you were to engage in real time, the deepfake will fail every time.
  2. Most deepfake audio is done deliberately with poor quality audio. This is because the voice systems used are often only sampled at 16khz, not the 48khz or 96khz that would be necessary to fool the human ear AND digital forensics. T
Department of Redundancy Department (Score:1)

by Daina.0 ( 7328506 ) writes:

What is an "Artificial Imposter"? Is it a real person or a robot that's pretending to be an imposter?
- Re: (Score:3)
  
  by hey! ( 33014 ) writes:
  
  It's a simulation of a human rather than an actual human performing the imposture.
Adversarial training will solve this (Score:3)

by BytePusher ( 209961 ) writes: on Saturday October 01, 2022 @12:14PM (#62929181) Homepage

Just another filter to use for adversarial training.

- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  upvote parent. That was my first thought too.
Of course, now this can be fixed (Score:2)

by Maxo-Texas ( 864189 ) writes:

All you need to do is set up a loop with the "deepfake detection" algorithm against deepfake audio creation and in a few weeks, you won't be able to detect a difference any more.
- Re: (Score:2)
  
  by Maxo-Texas ( 864189 ) writes:
  
  Well.. hell I should have read the responses ... everyone realizes this and I can't delete it.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Researchers Use Fluid Dynamics To Spot Artificial Imposter Voices (theconversation.com) 23

Researchers Use Fluid Dynamics To Spot Artificial Imposter Voices More Login

Researchers Use Fluid Dynamics To Spot Artificial Imposter Voices

Arms race (Score:3)

Re: (Score:1)

Re: Arms race (Score:4)

Re: Arms race (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: Arms race (Score:5, Informative)

Re: Arms race (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Arms race (Score:2)

Re: (Score:2)

Re: (Score:2)

Department of Redundancy Department (Score:1)

Re: (Score:3)

Adversarial training will solve this (Score:3)

Re: (Score:2)

Of course, now this can be fixed (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot