🤖 AI Summary
Audio deepfake detection (ADD) suffers from performance degradation under data-scarce conditions, particularly due to limited availability of labeled forged samples. Method: This paper pioneers the application of Audio Large Language Models (ALLMs) to ADD. Observing their insufficient zero-shot capability, we propose ALLM4ADD—a novel framework that reformulates ADD as an audio question-answering task (“Is this audio real or fake?”). It leverages instruction prompting and supervised fine-tuning to elicit fine-grained authenticity discrimination from ALLMs, integrating audio representation learning with linguistic reasoning—without requiring extensive labeled forgery data. Contribution/Results: Evaluated on multi-source deepfake datasets, ALLM4ADD significantly outperforms conventional methods, achieving substantial accuracy gains in low-resource settings. Results demonstrate both the efficacy of the ALLM-driven ADD paradigm and its strong cross-domain generalization capability.
📝 Abstract
Audio deepfake detection (ADD) has grown increasingly important due to the rise of high-fidelity audio generative models and their potential for misuse. Given that audio large language models (ALLMs) have made significant progress in various audio processing tasks, a heuristic question arises: Can ALLMs be leveraged to solve ADD?. In this paper, we first conduct a comprehensive zero-shot evaluation of ALLMs on ADD, revealing their ineffectiveness in detecting fake audio. To enhance their performance, we propose $mathcal{A}LLM4ADD$, an ALLM-driven framework for ADD. Specifically, we reformulate ADD task as an audio question answering problem, prompting the model with the question:"Is this audio fake or real?". We then perform supervised fine-tuning to enable the ALLM to assess the authenticity of query audio. Extensive experiments are conducted to demonstrate that our ALLM-based method can achieve superior performance in fake audio detection, particularly in data-scarce scenarios. As a pioneering study, we anticipate that this work will inspire the research community to leverage ALLMs to develop more effective ADD systems.