STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Current deepfake speech detection is hindered by scarce source-tracing data and sparse metadata, limiting model provenance and attribution analysis. To address this, we introduce the first systematic, metadata-rich source-tracing benchmark dataset, encompassing eight acoustic models and six vocoders, with 700K high-fidelity samples. We propose a multidimensional controllable variable design—orthogonally decoupling acoustic model, vocoder, model weights, and synthesis parameters—to enable fine-grained, interpretable labeling. Samples are systematically synthesized using mainstream frameworks (FastSpeech2, WaveNet, HiFi-GAN) under standardized preprocessing and unified metadata modeling. Experiments demonstrate substantial improvements in open-set source identification robustness: acoustic-model– and vocoder-model–level accuracy increases by 12.3%, respectively. This benchmark provides a reproducible foundation for deepfake detection, forensic audio authentication, and generative model auditing.

Technology Category

Application Category

📝 Abstract

A key research area in deepfake speech detection is source tracing - determining the origin of synthesised utterances. The approaches may involve identifying the acoustic model (AM), vocoder model (VM), or other generation-specific parameters. However, progress is limited by the lack of a dedicated, systematically curated dataset. To address this, we introduce STOPA, a systematically varied and metadata-rich dataset for deepfake speech source tracing, covering 8 AMs, 6 VMs, and diverse parameter settings across 700k samples from 13 distinct synthesisers. Unlike existing datasets, which often feature limited variation or sparse metadata, STOPA provides a systematically controlled framework covering a broader range of generative factors, such as the choice of the vocoder model, acoustic model, or pretrained weights, ensuring higher attribution reliability. This control improves attribution accuracy, aiding forensic analysis, deepfake detection, and generative model transparency.

Problem

Research questions and friction points this paper is trying to address.

Lack of dedicated dataset for deepfake speech source tracing

Identifying acoustic and vocoder models in synthesized utterances

Improving attribution accuracy for forensic analysis and detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically varied deepfake audio dataset

Covers 8 AMs, 6 VMs, diverse parameters

Improves attribution accuracy and reliability

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey