🤖 AI Summary
Audio deepfake detectors suffer from poor generalization to unseen synthesis methods and speakers, severely undermining their reliability in real-world deployment. To address this, we propose TWINSHIFT—a novel benchmark that explicitly decouples synthetic model identity from speaker identity, featuring six state-of-the-art generative systems and mutually exclusive speaker sets. It introduces a dual-transfer evaluation protocol: cross-synthesizer and cross-speaker zero-shot detection. Systematic experiments reveal substantial performance degradation (average drop of 32.7%) under strict zero-shot conditions, precisely exposing critical robustness blind spots. TWINSHIFT provides a reproducible, high-stakes standardized testbed and shifts the detection paradigm away from the unrealistic i.i.d. assumption toward strong generalization—essential for practical deployment. By rigorously isolating synthesis-model and speaker variability, it establishes a foundational evaluation framework and concrete optimization directions for next-generation robust audio deepfake detection systems.
📝 Abstract
Audio deepfakes pose a growing threat, already exploited in fraud and misinformation. A key challenge is ensuring detectors remain robust to unseen synthesis methods and diverse speakers, since generation techniques evolve quickly. Despite strong benchmark results, current systems struggle to generalize to new conditions limiting real-world reliability. To address this, we introduce TWINSHIFT, a benchmark explicitly designed to evaluate detection robustness under strictly unseen conditions. Our benchmark is constructed from six different synthesis systems, each paired with disjoint sets of speakers, allowing for a rigorous assessment of how well detectors generalize when both the generative model and the speaker identity change. Through extensive experiments, we show that TWINSHIFT reveals important robustness gaps, uncover overlooked limitations, and provide principled guidance for developing ADD systems. The TWINSHIFT benchmark can be accessed at https://github.com/intheMeantime/TWINSHIFT.