AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalizability of existing audio deepfake detection methods, which predominantly focus on speech and struggle in real-world scenarios involving environmental interference, novel generation techniques, and non-speech content such as sound effects, singing, and music. To bridge this gap, the study introduces the first comprehensive benchmark for deepfake detection across all audio types, featuring two challenge tracks—robust speech and full-spectrum audio—supported by standardized datasets, rigorous evaluation protocols, and reproducible baseline models. By fostering the development of generation-agnostic and content-type-independent detection approaches, this framework significantly enhances robustness against unseen forgery methods and improves cross-domain generalization, thereby advancing scalable audio forensics for multimedia security.
📝 Abstract
The rapid advancement of Audio Large Language Models (ALLMs) has enabled cost-effective, high-fidelity generation and manipulation of both speech and non-speech audio, including sound effects, singing voices, and music. While these capabilities foster creativity and content production, they also introduce significant security and trust challenges, as realistic audio deepfakes can now be generated and disseminated at scale. Existing audio deepfake detection (ADD) countermeasures (CMs) and benchmarks, however, remain largely speech-centric, often relying on speech-specific artifacts and exhibiting limited robustness to real-world distortions, as well as restricted generalization to heterogeneous audio types and emerging spoofing techniques. To address these gaps, we propose the All-Type Audio Deepfake Detection (AT-ADD) Grand Challenge for ACM Multimedia 2026, designed to bridge controlled academic evaluation with practical multimedia forensics. AT-ADD comprises two tracks: (1) Robust Speech Deepfake Detection, which evaluates detectors under real-world scenarios and against unseen, state-of-the-art speech generation methods; and (2) All-Type Audio Deepfake Detection, which extends detection beyond speech to diverse, unknown audio types and promotes type-agnostic generalization across speech, sound, singing, and music. By providing standardized datasets, rigorous evaluation protocols, and reproducible baselines, AT-ADD aims to accelerate the development of robust and generalizable audio forensic technologies, supporting secure communication, reliable media verification, and responsible governance in an era of pervasive synthetic audio.
Problem

Research questions and friction points this paper is trying to address.

Audio Deepfake Detection
Generalization
Robustness
Non-speech Audio
Multimedia Forensics
Innovation

Methods, ideas, or system contributions that make the work stand out.

audio deepfake detection
all-type audio
generalization
multimedia forensics
robust evaluation
🔎 Similar Papers
No similar papers found.