Towards Robust Speech Deepfake Detection via Human-Inspired Reasoning

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the limited generalization and lack of interpretability in current deepfake audio detection methods by proposing HIR-SDD, a novel framework that integrates human-inspired reasoning with large audio language models (LALMs). For the first time, it leverages a human-like chain-of-thought—constructed from human-annotated data—to enable robust and interpretable deepfake detection. The approach not only achieves high accuracy in distinguishing real from synthetic speech but also generates human-understandable justifications for its predictions. Experimental results demonstrate that HIR-SDD significantly outperforms existing methods in both detection accuracy and cross-domain generalization, while providing transparent and logically sound explanations for its decisions.

Technology Category

Application Category

📝 Abstract

The modern generative audio models can be used by an adversary in an unlawful manner, specifically, to impersonate other people to gain access to private information. To mitigate this issue, speech deepfake detection (SDD) methods started to evolve. Unfortunately, current SDD methods generally suffer from the lack of generalization to new audio domains and generators. More than that, they lack interpretability, especially human-like reasoning that would naturally explain the attribution of a given audio to the bona fide or spoof class and provide human-perceptible cues. In this paper, we propose HIR-SDD, a novel SDD framework that combines the strengths of Large Audio Language Models (LALMs) with the chain-of-thought reasoning derived from the novel proposed human-annotated dataset. Experimental evaluation demonstrates both the effectiveness of the proposed method and its ability to provide reasonable justifications for predictions.

Problem

Research questions and friction points this paper is trying to address.

speech deepfake detection

generalization

interpretability

human-like reasoning

audio spoofing

Innovation

Methods, ideas, or system contributions that make the work stand out.

speech deepfake detection

human-inspired reasoning

Large Audio Language Models