🤖 AI Summary
Existing PEFT methods neglect the dynamic expert routing mechanism inherent in Mixture-of-Experts (MoE) models, leading to architectural misalignment between adaptation modules and the underlying MoE structure. To address this, we propose *Routed-PEFT*, the first PEFT framework that explicitly incorporates expert routing into adapter design—dynamically assigning dedicated adapters to individual experts, thereby enabling joint optimization of expert specialization and task-specific adaptation. We systematically evaluate Routed-PEFT on OLMoE and Mixtral architectures, integrating it with LoRA, Adapter, and other PEFT variants under diverse routing strategies across commonsense and mathematical reasoning benchmarks. With only 0.1%–0.5% additional trainable parameters, Routed-PEFT achieves average accuracy gains of 2.3–5.7 percentage points over standard PEFT baselines. Moreover, our analysis uncovers task-dependent optimal routing configurations, establishing a novel paradigm for efficient fine-tuning of MoE models.
📝 Abstract
Mixture-of-Experts (MoE) benefits from a dynamic routing mechanism among their specialized experts, which existing Parameter- Efficient Fine-Tuning (PEFT) strategies fail to leverage. This motivates us to investigate whether adaptation modules themselves should incorporate routing mechanisms to align with MoE's multi-expert architecture. We analyze dynamics of core components when applying PEFT to MoE language models and examine how different routing strategies affect adaptation effectiveness. Extensive experiments adapting OLMoE-1B-7B and Mixtral-8x7B on various commonsense and math reasoning tasks validate the performance and efficiency of our routed approach. We identify the optimal configurations for different scenarios and provide empirical analyses with practical insights to facilitate better PEFT and MoE applications.