Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing MoE-based multi-task learning (MoE-MTL) approaches rely on single-task pre-trained backbones, resulting in high adaptation redundancy and inefficient cross-task knowledge sharing during the transition to multi-task settings. To address this, we propose Adaptive Shared Experts (ASE), a LoRA-enhanced framework that jointly normalizes routing gates to dynamically balance sparse task-specific experts and shared experts, thereby enabling efficient inter-task knowledge transfer. Our method innovatively integrates Low-Rank Adaptation (LoRA), fine-grained expert expansion, and gate optimization under strict parameter budget constraints—enhancing both expert specialization and collaborative capacity. Evaluated on the PASCAL-Context benchmark, ASE consistently outperforms strong baselines across multiple configurations, demonstrating substantial improvements in knowledge sharing and transfer efficacy within multi-task learning.

Technology Category

Application Category

📝 Abstract

Mixture-of-Experts (MoE) has emerged as a powerful framework for multi-task learning (MTL). However, existing MoE-MTL methods often rely on single-task pretrained backbones and suffer from redundant adaptation and inefficient knowledge sharing during the transition from single-task to multi-task learning (STL to MTL). To address these limitations, we propose adaptive shared experts (ASE) within a low-rank adaptation (LoRA) based MoE, where shared experts are assigned router-computed gating weights jointly normalized with sparse experts. This design facilitates STL to MTL transition, enhances expert specialization, and cooperation. Furthermore, we incorporate fine-grained experts by increasing the number of LoRA experts while proportionally reducing their rank, enabling more effective knowledge sharing under a comparable parameter budget. Extensive experiments on the PASCAL-Context benchmark, under unified training settings, demonstrate that ASE consistently improves performance across diverse configurations and validates the effectiveness of fine-grained designs for MTL.

Problem

Research questions and friction points this paper is trying to address.

Addressing redundant adaptation in multi-task learning transition

Improving knowledge sharing efficiency through adaptive expert design

Enhancing expert specialization with fine-grained LoRA architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive shared experts with router-computed gating weights

LoRA-based mixture of experts for multi-task learning

Fine-grained experts with reduced rank for parameter efficiency

🔎 Similar Papers

No similar papers found.