🤖 AI Summary
Existing MoE-LoRA methods rely on discrete routers, preventing full integration of expert modules with the backbone model and incurring non-negligible inference overhead. To address this, we propose FURINA—a router-free Mixture-of-Experts Low-Rank Adaptation framework. FURINA eliminates explicit routing by introducing direction-magnitude-decoupled LoRA adapters, an angle-similarity-based self-routing mechanism, shared magnitude-vector scaling, and a sparsity-driven expert selection loss. This enables dynamic expert activation and yields an end-to-end mergeable architecture. Empirically, FURINA significantly outperforms standard LoRA across multiple tasks, matches or exceeds state-of-the-art MoE-LoRA methods in performance, eliminates routing computation overhead entirely, supports zero-cost model merging, and—crucially—achieves the first seamless, unified deployment of MoE-LoRA with the backbone model.
📝 Abstract
The Mixture of Experts (MoE) paradigm has been successfully integrated into Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning (PEFT), delivering performance gains with minimal parameter overhead. However, a key limitation of existing MoE-LoRA methods is their reliance on a discrete router, which prevents the integration of the MoE components into the backbone model. To overcome this, we propose FURINA, a novel Free from Unmergeable Router framework based on the LINear Aggregation of experts. FURINA eliminates the router by introducing a Self-Routing mechanism. This is achieved through three core innovations: (1) decoupled learning of the direction and magnitude for LoRA adapters, (2) a shared learnable magnitude vector for consistent activation scaling, and (3) expert selection loss that encourages divergent expert activation. The proposed mechanism leverages the angular similarity between the input and each adapter's directional component to activate experts, which are then scaled by the shared magnitude vector. This design allows the output norm to naturally reflect the importance of each expert, thereby enabling dynamic, router-free routing. The expert selection loss further sharpens this behavior by encouraging sparsity and aligning it with standard MoE activation patterns. We also introduce a shared expert within the MoE-LoRA block that provides stable, foundational knowledge. To the best of our knowledge, FURINA is the first router-free, MoE-enhanced LoRA method that can be fully merged into the backbone model, introducing zero additional inference-time cost or complexity. Extensive experiments demonstrate that FURINA not only significantly outperforms standard LoRA but also matches or surpasses the performance of existing MoE-LoRA methods, while eliminating the extra inference-time overhead of MoE.