TwinShift: Benchmarking Audio Deepfake Detection across Synthesizer and Speaker Shifts

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Audio deepfake detectors suffer from poor generalization to unseen synthesis methods and speakers, severely undermining their reliability in real-world deployment. To address this, we propose TWINSHIFT—a novel benchmark that explicitly decouples synthetic model identity from speaker identity, featuring six state-of-the-art generative systems and mutually exclusive speaker sets. It introduces a dual-transfer evaluation protocol: cross-synthesizer and cross-speaker zero-shot detection. Systematic experiments reveal substantial performance degradation (average drop of 32.7%) under strict zero-shot conditions, precisely exposing critical robustness blind spots. TWINSHIFT provides a reproducible, high-stakes standardized testbed and shifts the detection paradigm away from the unrealistic i.i.d. assumption toward strong generalization—essential for practical deployment. By rigorously isolating synthesis-model and speaker variability, it establishes a foundational evaluation framework and concrete optimization directions for next-generation robust audio deepfake detection systems.

Technology Category

Application Category

📝 Abstract

Audio deepfakes pose a growing threat, already exploited in fraud and misinformation. A key challenge is ensuring detectors remain robust to unseen synthesis methods and diverse speakers, since generation techniques evolve quickly. Despite strong benchmark results, current systems struggle to generalize to new conditions limiting real-world reliability. To address this, we introduce TWINSHIFT, a benchmark explicitly designed to evaluate detection robustness under strictly unseen conditions. Our benchmark is constructed from six different synthesis systems, each paired with disjoint sets of speakers, allowing for a rigorous assessment of how well detectors generalize when both the generative model and the speaker identity change. Through extensive experiments, we show that TWINSHIFT reveals important robustness gaps, uncover overlooked limitations, and provide principled guidance for developing ADD systems. The TWINSHIFT benchmark can be accessed at https://github.com/intheMeantime/TWINSHIFT.

Problem

Research questions and friction points this paper is trying to address.

Evaluating audio deepfake detection robustness under unseen conditions

Assessing detector generalization across different synthesis methods and speakers

Addressing reliability gaps in current audio deepfake detection systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark evaluates detection under unseen conditions

Uses six synthesis systems with disjoint speaker sets

Assesses generalization across model and speaker changes

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey