ESDD 2026: Environmental Sound Deepfake Detection Challenge Evaluation Plan

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

To address the lack of large-scale, diverse benchmark datasets for environmental sound deepfake detection (ESDD), this work introduces EnvSDD—the first high-quality annotated dataset comprising 45.25 hours of authentic and 316.7 hours of synthetic environmental audio, spanning multiple acoustic scenes and generative models. Building upon EnvSDD, we organize the ESDD 2026 International Challenge, featuring two realistic evaluation tasks: “unknown generator identification” and “low-resource black-box detection.” We further propose an acoustic-feature-driven deep learning framework and conduct systematic evaluations of model generalizability, robustness, and few-shot adaptability. EnvSDD fills a critical gap in ESDD benchmarking and significantly advances the practical deployment of audio forensic technologies.

Technology Category

Application Category

📝 Abstract

Recent advances in audio generation systems have enabled the creation of highly realistic and immersive soundscapes, which are increasingly used in film and virtual reality. However, these audio generators also raise concerns about potential misuse, such as generating deceptive audio content for fake videos and spreading misleading information. Existing datasets for environmental sound deepfake detection (ESDD) are limited in scale and audio types. To address this gap, we have proposed EnvSDD, the first large-scale curated dataset designed for ESDD, consisting of 45.25 hours of real and 316.7 hours of fake sound. Based on EnvSDD, we are launching the Environmental Sound Deepfake Detection Challenge. Specifically, we present two different tracks: ESDD in Unseen Generators and Black-Box Low-Resource ESDD, covering various challenges encountered in real-life scenarios. The challenge will be held in conjunction with the 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026).

Problem

Research questions and friction points this paper is trying to address.

Detecting deceptive environmental audio deepfakes for misuse prevention

Addressing limited scale and diversity in current ESDD datasets

Evaluating detection models in real-world black-box low-resource scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale curated dataset EnvSDD

Two challenge tracks for detection

Integration with ICASSP 2026

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection

2024-09-23arXiv.orgCitations: 1

Audio Anti-Spoofing Detection: A Survey

2024-04-22arXiv.orgCitations: 25

Zillow Group

$104,000.00 - $166,000.00 annually

remote / U.S. (50 states) / California

Research Scientist Intern, Multimodal AI (PhD)