HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models

📅 2024-09-30

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the high computational cost of full-parameter fine-tuning and severe source-domain performance degradation in LLM-based ASR models during multi-accent domain adaptation, this paper proposes a LoRA-MoE hybrid architecture. Methodologically, it integrates low-rank adaptation (LoRA) with mixture-of-experts (MoE) to achieve parameter-efficient, domain-generalizable adaptation. Key contributions include: (1) a novel hierarchical routing mechanism that explicitly maps LoRA experts to accent domains; and (2) a dynamic gating threshold replacing static Top-K selection, enabling adaptive expert activation in the MoE layer. The framework is applicable to arbitrary linear layers. Experiments on multi-accent and standard Chinese ASR benchmarks demonstrate that our method achieves comparable target-domain accuracy using only 9.6% of the trainable parameters required for full fine-tuning, while preserving source-domain performance with negligible degradation (<0.3% accuracy loss).

Technology Category

Application Category

📝 Abstract

Recent advancements in integrating Large Language Models (LLM) with automatic speech recognition (ASR) have performed remarkably in general domains. While supervised fine-tuning (SFT) of all model parameters is often employed to adapt pre-trained LLM-based ASR models to specific domains, it imposes high computational costs and notably reduces their performance in general domains. In this paper, we propose a novel parameter-efficient multi-domain fine-tuning method for adapting pre-trained LLM-based ASR models to multi-accent domains without catastrophic forgetting named extit{HDMoLE}, which leverages hierarchical routing and dynamic thresholds based on combining low-rank adaptation (LoRA) with the mixer of experts (MoE) and can be generalized to any linear layer. Hierarchical routing establishes a clear correspondence between LoRA experts and accent domains, improving cross-domain collaboration among the LoRA experts. Unlike the static Top-K strategy for activating LoRA experts, dynamic thresholds can adaptively activate varying numbers of LoRA experts at each MoE layer. Experiments on the multi-accent and standard Mandarin datasets demonstrate the efficacy of HDMoLE. Applying HDMoLE to an LLM-based ASR model projector module achieves similar performance to full fine-tuning in the target multi-accent domains while using only 9.6% of the trainable parameters required for full fine-tuning and minimal degradation in the source general domain.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Supervised Fine-tuning

Computational Cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

HDMoLE

LoRA Expert System

Parameter Efficiency

🔎 Similar Papers

No similar papers found.