An anonymous reader quotes a report from The Conversation: To detect audio deepfakes, we and our research colleagues at the University of Florida have developed a technique that measures the acoustic and fluid dynamic differences between voice samples created organically by human speakers and those generated synthetically by computers.
The first step in differentiating speech produced by humans from speech generated by deepfakes is understanding how to acoustically model the vocal tract. Luckily scientists have techniques to estimate what someone — or some being such as a dinosaur — would sound like based on anatomical measurements of its vocal tract. We did the reverse. By inverting many of these same techniques, we were able to extract an approximation of a speaker’s vocal tract during a segment of speech. This allowed us to effectively peer into the anatomy of the speaker who created the audio sample.
From here, we hypothesized that deepfake audio samples would fail to be constrained by the same anatomical limitations humans have. In other words, the analysis of deepfaked audio samples simulated vocal tract shapes that do not exist in people. Our testing results not only confirmed our hypothesis but revealed something interesting. When extracting vocal tract estimations from deepfake audio, we found that the estimations were often comically incorrect. For instance, it was common for deepfake audio to result in vocal tracts with the same relative diameter and consistency as a drinking straw, in contrast to human vocal tracts, which are much wider and more variable in shape. This realization demonstrates that deepfake audio, even when convincing to human listeners, is far from indistinguishable from human-generated speech. By estimating the anatomy responsible for creating the observed speech, it’s possible to identify the whether the audio was generated by a person or a computer.
Read more of this story at Slashdot.
Source: Slashdot – Researchers Use Fluid Dynamics To Spot Artificial Imposter Voices