🤖 AI Summary
Neural codec-based audio provenance under open-set conditions remains challenging—existing methods struggle to identify unknown forgery techniques and exhibit poor robustness against out-of-distribution (OOD) authentic audio. Method: We formalize NCST (Neural Codec Source Tracking) as a novel task and introduce ST-Codecfake, the first bilingual, multi-codec, OOD-enriched benchmark for audio source attribution. Our approach integrates open-set classification, Adversarial Logit Matching (ALM)-based interpretable anomaly detection, and multi-source neural codec feature modeling. Contribution/Results: Experiments demonstrate state-of-the-art performance in both in-distribution (ID) source classification and OOD detection. Crucially, our analysis reveals insufficient cross-domain generalization of authentic audio as a fundamental bottleneck. The ST-Codecfake dataset and source code are publicly released to foster reproducible research.
📝 Abstract
Current research in audio deepfake detection is gradually transitioning from binary classification to multi-class tasks, referred as audio deepfake source tracing task. However, existing studies on source tracing consider only closed-set scenarios and have not considered the challenges posed by open-set conditions. In this paper, we define the Neural Codec Source Tracing (NCST) task, which is capable of performing open-set neural codec classification and interpretable ALM detection. Specifically, we constructed the ST-Codecfake dataset for the NCST task, which includes bilingual audio samples generated by 11 state-of-the-art neural codec methods and ALM-based out-ofdistribution (OOD) test samples. Furthermore, we establish a comprehensive source tracing benchmark to assess NCST models in open-set conditions. The experimental results reveal that although the NCST models perform well in in-distribution (ID) classification and OOD detection, they lack robustness in classifying unseen real audio. The ST-codecfake dataset and code are available.