Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited robustness of audio deepfake detection against unmodified, compressed, and “washed” (i.e., anti-detection) speech. To tackle this, we propose a multilingual, multi-synthesis-source data integration strategy. Methodologically, we integrate the WavLM self-supervised large model frontend with RawBoost acoustic augmentation within the AASIST architecture, enabling joint modeling across languages and distortion types. Crucially, our approach leverages collaborative training on diverse multilingual and multi-source synthetic data, substantially enhancing generalization under complex adversarial conditions—including real-world noise, compression artifacts, and active evasion attacks. Evaluated on the SAFE Challenge, our method achieves second place in both Task 1 (original deepfake detection) and Task 3 (washed-audio detection), demonstrating strong robustness and practical efficacy in realistic, challenging scenarios.

Technology Category

Application Category

📝 Abstract
The SAFE Challenge evaluates synthetic speech detection across three tasks: unmodified audio, processed audio with compression artifacts, and laundered audio designed to evade detection. We systematically explore self-supervised learning (SSL) front-ends, training data compositions, and audio length configurations for robust deepfake detection. Our AASIST-based approach incorporates WavLM large frontend with RawBoost augmentation, trained on a multilingual dataset of 256,600 samples spanning 9 languages and over 70 TTS systems from CodecFake, MLAAD v5, SpoofCeleb, Famous Figures, and MAILABS. Through extensive experimentation with different SSL front-ends, three training data versions, and two audio lengths, we achieved second place in both Task 1 (unmodified audio detection) and Task 3 (laundered audio detection), demonstrating strong generalization and robustness.
Problem

Research questions and friction points this paper is trying to address.

Detecting synthetic speech in unmodified and processed audio
Improving robustness against compression and laundering artifacts
Evaluating multilingual datasets for generalized deepfake detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning frontends for detection
Multilingual dataset integration with 256k samples
RawBoost augmentation and AASIST-based architecture
🔎 Similar Papers
No similar papers found.
Hashim Ali
Hashim Ali
Assistant Professor of Computer Science, Abdul Wali Khan University Mardan, KPK, Pakistan
Cloud ComputingMachine learningSoftware Engineering
S
Surya Subramani
Electrical and Computer Engineering, University of Michigan, Dearborn, USA
L
Lekha Bollinani
Electrical and Computer Engineering, University of Michigan, Dearborn, USA
N
Nithin Sai Adupa
Electrical and Computer Engineering, University of Michigan, Dearborn, USA
S
Sali El-Loh
Electrical and Computer Engineering, University of Michigan, Dearborn, USA
Hafiz Malik
Hafiz Malik
University of Michigan - Dearborn
DeepfakesAICybersecurityCPS SecurityInformation Fusion