ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

This work addresses the limited generalization of existing audio deepfake detection methods in real-world scenarios, particularly their inability to handle unseen forgery types. The authors propose a training-free in-context learning framework that, for the first time, integrates contrastive-guided reasoning into audio forgery detection. By leveraging audio language models for zero-shot inference on out-of-distribution samples and incorporating a routing mechanism to collaboratively engage specialized detectors, the approach generates interpretable textual justifications for its predictions. Evaluated on real-world datasets, the method achieves up to a two-fold improvement in macro F1 score over state-of-the-art specialized detectors, demonstrating superior generalization capability and strong potential for practical deployment.

Technology Category

Application Category

📝 Abstract

Audio deepfakes pose a significant security threat, yet current state-of-the-art (SOTA) detection systems do not generalize well to realistic in-the-wild deepfakes. We introduce a novel \textbf{I}n-\textbf{C}ontext \textbf{L}earning paradigm with comparison-guidance for \textbf{A}udio \textbf{D}eepfake detection (\textbf{ICLAD}). The framework enables the use of audio language models (ALMs) for training-free generalization to unseen deepfakes and provides textual rationales on the detection outcome. At the core of ICLAD is a pairwise comparative reasoning strategy that guides the ALM to discover and filter hallucinations and deepfake-irrelevant acoustic attributes. The ALM works alongside a specialized deepfake detector, whereby a routing mechanism feeds out-of-distribution samples to the ALM. On in-the-wild datasets, ICLAD improves macro F1 over the specialized detector, with up to $2\times$ relative improvement. Further analysis demonstrates the flexibility of ICLAD and its potential for deployment on recent open-source ALMs.

Problem

Research questions and friction points this paper is trying to address.

audio deepfake detection

generalization

in-the-wild deepfakes

security threat

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-Context Learning

Audio Deepfake Detection

Comparison-Guided Reasoning

Audio Language Models

Training-Free Generalization

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey

2024-04-22arXiv.orgCitations: 25

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection

2024-09-23arXiv.orgCitations: 1