MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost of full-parameter fine-tuning of self-supervised learning (SSL) models for audio deepfake detection, this paper proposes a parameter-efficient fine-tuning method that synergistically integrates Low-Rank Adaptation (LoRA) with a Mixture-of-Experts (MoE) architecture, augmented by a novel domain-aware dynamic routing mechanism. This mechanism enables adaptive expert activation—while keeping the backbone parameters frozen—to detect both known and unknown spoofing attacks, supporting zero-shot generalization to unseen domains. Evaluated on the ASVSpoof 2025 dataset without data augmentation, the method achieves a state-of-the-art equal error rate (EER) of 5.56%. Its core contribution lies in being the first to jointly leverage LoRA and MoE with domain-aware routing for audio deepfake detection, thereby achieving an optimal balance among model efficiency, cross-domain generalizability, and scalability.

Technology Category

Application Category

📝 Abstract
While self-supervised learning (SSL)-based models have boosted audio deepfake detection accuracy, fully finetuning them is computationally expensive. To address this, we propose a parameter-efficient framework that combines Low-Rank Adaptation with a Mixture-of-Experts router, called Mixture of LoRA Experts (MoLEx). It preserves pre-trained knowledge of SSL models while efficiently finetuning only selected experts, reducing training costs while maintaining robust performance. The observed utility of experts during inference shows the router reactivates the same experts for similar attacks but switches to other experts for novel spoofs, confirming MoLEx's domain-aware adaptability. MoLEx additionally offers flexibility for domain adaptation by allowing extra experts to be trained without modifying the entire model. We mainly evaluate our approach on the ASVSpoof 5 dataset and achieve the state-of-the-art (SOTA) equal error rate (EER) of 5.56% on the evaluation set without augmentation.
Problem

Research questions and friction points this paper is trying to address.

Efficiently finetune SSL models for audio deepfake detection
Reduce computational costs while maintaining detection performance
Enable domain-aware adaptability against novel spoofing attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines Low-Rank Adaptation with Mixture-of-Experts
Efficiently finetunes only selected domain-specific experts
Allows adding new experts without modifying entire model
🔎 Similar Papers
Zihan Pan
Zihan Pan
Agency for Science, Technology and Research, Singapore (A*Star)
Audio Deep FakeLarge Language ModelSpeech Representation ModelParalinguistic AI
S
Sailor Hardik Bhupendra
Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, 138634
J
Jinyang Wu
Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, 138634