$mathcal{A}LLM4ADD$: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Audio deepfake detection (ADD) suffers from performance degradation under data-scarce conditions, particularly due to limited availability of labeled forged samples. Method: This paper pioneers the application of Audio Large Language Models (ALLMs) to ADD. Observing their insufficient zero-shot capability, we propose ALLM4ADD—a novel framework that reformulates ADD as an audio question-answering task (“Is this audio real or fake?”). It leverages instruction prompting and supervised fine-tuning to elicit fine-grained authenticity discrimination from ALLMs, integrating audio representation learning with linguistic reasoning—without requiring extensive labeled forgery data. Contribution/Results: Evaluated on multi-source deepfake datasets, ALLM4ADD significantly outperforms conventional methods, achieving substantial accuracy gains in low-resource settings. Results demonstrate both the efficacy of the ALLM-driven ADD paradigm and its strong cross-domain generalization capability.

Technology Category

Application Category

📝 Abstract
Audio deepfake detection (ADD) has grown increasingly important due to the rise of high-fidelity audio generative models and their potential for misuse. Given that audio large language models (ALLMs) have made significant progress in various audio processing tasks, a heuristic question arises: Can ALLMs be leveraged to solve ADD?. In this paper, we first conduct a comprehensive zero-shot evaluation of ALLMs on ADD, revealing their ineffectiveness in detecting fake audio. To enhance their performance, we propose $mathcal{A}LLM4ADD$, an ALLM-driven framework for ADD. Specifically, we reformulate ADD task as an audio question answering problem, prompting the model with the question:"Is this audio fake or real?". We then perform supervised fine-tuning to enable the ALLM to assess the authenticity of query audio. Extensive experiments are conducted to demonstrate that our ALLM-based method can achieve superior performance in fake audio detection, particularly in data-scarce scenarios. As a pioneering study, we anticipate that this work will inspire the research community to leverage ALLMs to develop more effective ADD systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluating audio LLMs for deepfake detection effectiveness
Proposing a framework to reformulate detection as Q&A
Enhancing fake audio detection in data-scarce scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulate ADD as audio question answering
Supervised fine-tuning for authenticity assessment
ALLM-driven framework for superior detection
🔎 Similar Papers
2024-04-22arXiv.orgCitations: 25