How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the ambiguity in labeling resynthesized audio generated by neural audio codecs—a class of models that combine compression and synthesis capabilities—within the context of voice spoofing detection. The study presents the first systematic analysis of this labeling challenge, introducing an extended version of the ASVspoof 5 dataset and proposing multiple annotation strategies tailored to resynthesized audio. A unified evaluation framework is designed to assess the impact of different labeling approaches on anti-spoofing systems, leveraging resynthesis techniques that integrate neural codecs with vocoders. Experimental results demonstrate that the choice of annotation strategy significantly influences detection performance, offering critical insights for future dataset construction and evaluation protocols in audio deepfake detection research.

Technology Category

Application Category

📝 Abstract
Since Text-to-Speech systems typically don't produce waveforms directly, recent spoof detection studies use resynthesized waveforms from vocoders and neural audio codecs to simulate an attacker. Unlike vocoders, which are specifically designed for speech synthesis, neural audio codecs were originally developed for compressing audio for storage and transmission. However, their ability to discretize speech also sparked interest in language-modeling-based speech synthesis. Owing to this dual functionality, codec resynthesized data may be labeled as either bonafide or spoof. So far, very little research has addressed this issue. In this study, we present a challenging extension of the ASVspoof 5 dataset constructed for this purpose. We examine how different labeling choices affect detection performance and provide insights into labeling strategies.
Problem

Research questions and friction points this paper is trying to address.

audio deepfake detection
neural audio codecs
resynthesized audio
labeling ambiguity
spoof detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

neural audio codecs
audio deepfake detection
resynthesized audio
labeling strategy
ASVspoof
🔎 Similar Papers
No similar papers found.
Y
Yixuan Xiao
University of Stuttgart, Institute for Natural Language Processing, Germany
Florian Lux
Florian Lux
Speech Technology Scientist, AppTek
Speech SynthesisNatural Language ProcessingMachine LearningArtificial Intelligence
A
Alejandro Pérez-González-de-Martos
AppTek GmbH, Germany
N
Ngoc Thang Vu
University of Stuttgart, Institute for Natural Language Processing, Germany