Audio Forensics Evaluation (SAFE) Challenge

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

The increasing realism of text-to-speech (TTS) synthesis, coupled with evasive post-processing—such as compression, resampling, and laundering—poses significant challenges to audio forensics detection. Method: This paper introduces the first fully blind, multi-stage, systematic evaluation framework. It integrates 17 mainstream TTS models to generate 21,000 samples across three progressively challenging scenarios: pristine, compressed, and laundered audio—constituting a 90-hour multitask benchmark. The dataset encompasses diverse real-world recordings and multiple laundering attacks. Contributions/Results: (1) We release the first large-scale, robustness-oriented benchmark for synthetic speech detection, featuring three distinct subtasks; (2) we empirically demonstrate substantial performance degradation of existing detectors on laundered audio; and (3) we advance standardization in audio forensics by establishing a reproducible, comprehensive evaluation paradigm—providing both a rigorous baseline and a clear technical roadmap for future research.

Technology Category

Application Category

📝 Abstract

The increasing realism of synthetic speech generated by advanced text-to-speech (TTS) models, coupled with post-processing and laundering techniques, presents a significant challenge for audio forensic detection. In this paper, we introduce the SAFE (Synthetic Audio Forensics Evaluation) Challenge, a fully blind evaluation framework designed to benchmark detection models across progressively harder scenarios: raw synthetic speech, processed audio (e.g., compression, resampling), and laundered audio intended to evade forensic analysis. The SAFE challenge consisted of a total of 90 hours of audio and 21,000 audio samples split across 21 different real sources and 17 different TTS models and 3 tasks. We present the challenge, evaluation design and tasks, dataset details, and initial insights into the strengths and limitations of current approaches, offering a foundation for advancing synthetic audio detection research. More information is available at href{https://stresearch.github.io/SAFE/}{https://stresearch.github.io/SAFE/}.

Problem

Research questions and friction points this paper is trying to address.

Evaluating detection of synthetic speech in raw, processed, and laundered audio

Benchmarking forensic models against advanced TTS and evasion techniques

Assessing current approaches' limitations in synthetic audio forensics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Blind evaluation framework for audio forensic detection

Benchmarking detection models across harder scenarios

Evaluating raw, processed, and laundered synthetic audio

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey

2024-04-22arXiv.orgCitations: 25

Anthropic

$350,000—$500,000 USD

San Francisco, CA, USA

Research Scientist Intern, Multimodal AI (PhD)