Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address the need for real-time deepfake speech detection, this work proposes an efficient architecture that replaces self-attention with bidirectional Mamba blocks. Methodologically, it employs XLSR-Wav2Vec as the front-end acoustic representation and introduces three novel bidirectional Mamba-based encoders—TransBiMamba, ConBiMamba, and PN-BiMamba—to jointly capture both local fine-grained artifacts and global contextual cues while maintaining low inference latency and modeling long-range temporal dependencies. Experiments on ASVspoof2021 Logical Access (LA), DeepFake (DF), and In-The-Wild benchmarks yield EERs of 0.97%, 1.74%, and 5.85%, respectively—substantially outperforming state-of-the-art models including XLSR-Conformer and XLSR-Mamba. The proposed approach achieves superior accuracy, strong generalization across diverse spoofing attacks and recording conditions, and practical deployability in real-time scenarios.

Technology Category

Application Category

📝 Abstract

Advances in speech synthesis intensify security threats, motivating real-time deepfake detection research. We investigate whether bidirectional Mamba can serve as a competitive alternative to Self-Attention in detecting synthetic speech. Our solution, Fake-Mamba, integrates an XLSR front-end with bidirectional Mamba to capture both local and global artifacts. Our core innovation introduces three efficient encoders: TransBiMamba, ConBiMamba, and PN-BiMamba. Leveraging XLSR's rich linguistic representations, PN-BiMamba can effectively capture the subtle cues of synthetic speech. Evaluated on ASVspoof 21 LA, 21 DF, and In-The-Wild benchmarks, Fake-Mamba achieves 0.97%, 1.74%, and 5.85% EER, respectively, representing substantial relative gains over SOTA models XLSR-Conformer and XLSR-Mamba. The framework maintains real-time inference across utterance lengths, demonstrating strong generalization and practical viability. The code is available at https://github.com/xuanxixi/Fake-Mamba.

Problem

Research questions and friction points this paper is trying to address.

Detect synthetic speech in real-time using bidirectional Mamba

Improve deepfake detection accuracy over existing SOTA models

Maintain real-time performance across varying utterance lengths

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional Mamba replaces Self-Attention mechanism

XLSR front-end integrates with three efficient encoders

PN-BiMamba captures subtle synthetic speech cues

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection