Forensic Similarity for Speech Deepfakes

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

This work addresses the forensic provenance problem of speech deepfakes by proposing the first forensic similarity analysis method for deepfake audio, designed to determine whether two audio samples originate from the same generative model. The method employs a two-stage deep network: first extracting robust forensic features using a pretrained deepfake detector, then computing a consistency score via a lightweight similarity network. Crucially, it requires no assumptions about or training on specific forgery artifacts, enabling strong generalization to unseen deepfake techniques. Evaluated on source verification, it significantly outperforms baseline methods and supports extended applications such as splice detection. Its core contribution lies in pioneering the forensic similarity paradigm for speech deepfakes—overcoming traditional limitations that rely on known artifacts or model priors—while ensuring robustness, adaptability, and practical utility.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce a digital audio forensics approach called Forensic Similarity for Speech Deepfakes, which determines whether two audio segments contain the same forensic traces or not. Our work is inspired by prior work in the image domain on forensic similarity, which proved strong generalization capabilities against unknown forensic traces, without requiring prior knowledge of them at training time. To achieve this in the audio setting, we propose a two-part deep-learning system composed of a feature extractor based on a speech deepfake detector backbone and a shallow neural network, referred to as the similarity network. This system maps pairs of audio segments to a score indicating whether they contain the same or different forensic traces. We evaluate the system on the emerging task of source verification, highlighting its ability to identify whether two samples originate from the same generative model. Additionally, we assess its applicability to splicing detection as a complementary use case. Experiments show that the method generalizes to a wide range of forensic traces, including previously unseen ones, illustrating its flexibility and practical value in digital audio forensics.

Problem

Research questions and friction points this paper is trying to address.

Detecting shared forensic traces in audio segments

Verifying if speech samples originate from same generative model

Identifying audio manipulations through deepfake detection system

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning system with feature extractor and similarity network

Maps audio pairs to forensic trace similarity scores

Generalizes to unseen forensic traces without prior knowledge

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection