HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models

📅 2024-09-30
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost of full-parameter fine-tuning and severe source-domain performance degradation in LLM-based ASR models during multi-accent domain adaptation, this paper proposes a LoRA-MoE hybrid architecture. Methodologically, it integrates low-rank adaptation (LoRA) with mixture-of-experts (MoE) to achieve parameter-efficient, domain-generalizable adaptation. Key contributions include: (1) a novel hierarchical routing mechanism that explicitly maps LoRA experts to accent domains; and (2) a dynamic gating threshold replacing static Top-K selection, enabling adaptive expert activation in the MoE layer. The framework is applicable to arbitrary linear layers. Experiments on multi-accent and standard Chinese ASR benchmarks demonstrate that our method achieves comparable target-domain accuracy using only 9.6% of the trainable parameters required for full fine-tuning, while preserving source-domain performance with negligible degradation (<0.3% accuracy loss).

Technology Category

Application Category

📝 Abstract
Recent advancements in integrating Large Language Models (LLM) with automatic speech recognition (ASR) have performed remarkably in general domains. While supervised fine-tuning (SFT) of all model parameters is often employed to adapt pre-trained LLM-based ASR models to specific domains, it imposes high computational costs and notably reduces their performance in general domains. In this paper, we propose a novel parameter-efficient multi-domain fine-tuning method for adapting pre-trained LLM-based ASR models to multi-accent domains without catastrophic forgetting named extit{HDMoLE}, which leverages hierarchical routing and dynamic thresholds based on combining low-rank adaptation (LoRA) with the mixer of experts (MoE) and can be generalized to any linear layer. Hierarchical routing establishes a clear correspondence between LoRA experts and accent domains, improving cross-domain collaboration among the LoRA experts. Unlike the static Top-K strategy for activating LoRA experts, dynamic thresholds can adaptively activate varying numbers of LoRA experts at each MoE layer. Experiments on the multi-accent and standard Mandarin datasets demonstrate the efficacy of HDMoLE. Applying HDMoLE to an LLM-based ASR model projector module achieves similar performance to full fine-tuning in the target multi-accent domains while using only 9.6% of the trainable parameters required for full fine-tuning and minimal degradation in the source general domain.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Supervised Fine-tuning
Computational Cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

HDMoLE
LoRA Expert System
Parameter Efficiency
🔎 Similar Papers
No similar papers found.