Diffusion Reconstruction towards Generalizable Audio Deepfake Detection

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the limited generalization of audio deepfake detection models against unseen attacks by proposing a hard example generation mechanism based on diffusion-based reconstruction. The approach synthesizes challenging forged audio samples to augment training data, integrating multi-level feature aggregation with regularization-augmented contrastive learning to enhance the model’s discriminative capability for previously unseen forgery types. Experimental results demonstrate that the proposed framework significantly reduces the average Equal Error Rate (EER) across multiple cross-domain and unseen attack scenarios, outperforming current state-of-the-art baseline methods.

📝 Abstract

Achieving robust generalization against unseen attacks remains a challenge in Audio Deepfake Detection (ADD), driven by the rapid evolution of generative models. To address this, we propose a framework centered on hard sample classification. The core idea is that a model capable of distinguishing challenging hard samples is inherently equipped to handle simpler cases effectively. We investigate multiple reconstruction paradigms, identifying the diffusion-based method as optimal for generating hard samples. Furthermore, we leverage multi-layer feature aggregation and introduce a Regularization-Assisted Contrastive Learning (RACL) objective to enhance generalizability. Experiments demonstrate the superior generalization of our approach, with our best model achieving a significant reduction in the average Equal Error Rate (EER) compared to the baseline.

Problem

Research questions and friction points this paper is trying to address.

Audio Deepfake Detection

Generalization

Unseen Attacks

Generative Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion reconstruction

hard sample generation

audio deepfake detection

contrastive learning