Frustratingly Easy Zero-Day Audio DeepFake Detection via Retrieval Augmentation and Profile Matching

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

Current audio deepfake detectors exhibit poor generalization against unseen synthesis methods (zero-day attacks), while fine-tuning models fails to meet rapid-response requirements. This paper proposes a training-free zero-day detection framework—the first to integrate knowledge retrieval with voiceprint contour matching. It leverages pretrained models to extract speech representations, performs similarity search over a large-scale retrieval pool, and fuses multi-granularity voiceprint features for robust matching. By eliminating the need for model retraining, the approach enables immediate deployment and scalable updates. On the DeepFake-Eval-2024 benchmark, it achieves detection performance comparable to fully fine-tuned models. Ablation studies confirm the critical impact of retrieval pool size and voiceprint attribute design on accuracy.

Technology Category

Application Category

📝 Abstract

Modern audio deepfake detectors using foundation models and large training datasets have achieved promising detection performance. However, they struggle with zero-day attacks, where the audio samples are generated by novel synthesis methods that models have not seen from reigning training data. Conventional approaches against such attacks require fine-tuning the detectors, which can be problematic when prompt response is required. This study introduces a training-free framework for zero-day audio deepfake detection based on knowledge representations, retrieval augmentation, and voice profile matching. Based on the framework, we propose simple yet effective knowledge retrieval and ensemble methods that achieve performance comparable to fine-tuned models on DeepFake-Eval-2024, without any additional model-wise training. We also conduct ablation studies on retrieval pool size and voice profile attributes, validating their relevance to the system efficacy.

Problem

Research questions and friction points this paper is trying to address.

Detecting zero-day audio deepfakes without model training

Using retrieval augmentation and profile matching techniques

Achieving performance comparable to fine-tuned detectors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework for zero-day audio deepfake detection

Knowledge retrieval and profile matching without model training

Retrieval augmentation with voice profiles for detection

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey