Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts

πŸ“… 2025-10-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current sparse Mixture-of-Experts (sMoE) routing mechanisms rely on similarity-based scoring, which struggles to capture the intrinsic structure of input dataβ€”leading to a fundamental trade-off between expert specialization and computational load balancing, thereby limiting model scalability and performance. This paper proposes a decoupled probabilistic routing framework: it explicitly models input-space partitioning via an independently trained probabilistic mixture model, thereby disentangling routing decisions from downstream task optimization; further, it incorporates a dynamic sparsity activation mechanism to enable domain-aware expert selection. The approach significantly improves both expert specialization clarity and load balancing across experts. Empirical evaluation demonstrates consistent superiority over state-of-the-art sMoE baselines on multiple vision-language tasks, achieving simultaneous gains in predictive performance and expert utilization efficiency.

Technology Category

Application Category

πŸ“ Abstract
Sparse Mixture of Experts (sMoE) has become a pivotal approach for scaling large vision-language models, offering substantial capacity while maintaining computational efficiency through dynamic, sparse activation of experts. However, existing routing mechanisms, typically based on similarity scoring, struggle to effectively capture the underlying input structure. This limitation leads to a trade-off between expert specialization and balanced computation, hindering both scalability and performance. We propose Input Domain Aware MoE, a novel routing framework that leverages a probabilistic mixture model to better partition the input space. By modeling routing probabilities as a mixture of distributions, our method enables experts to develop clear specialization boundaries while achieving balanced utilization. Unlike conventional approaches, our routing mechanism is trained independently of task-specific objectives, allowing for stable optimization and decisive expert assignments. Empirical results on vision-language tasks demonstrate that our method consistently outperforms existing sMoE approaches, achieving higher task performance and improved expert utilization balance.
Problem

Research questions and friction points this paper is trying to address.

Decoupling routing decisions from task optimization in MoE
Improving expert specialization and balanced computation trade-off
Enhancing input space partitioning with probabilistic mixture models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses probabilistic mixture model for input partitioning
Decouples routing training from task optimization
Enables expert specialization with balanced utilization
Y
Yongxiang Hua
University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, China
H
Haoyu Cao
University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, China
Z
Zhou Tao
University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, China
B
Bocheng Li
University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, China
Zihao Wu
Zihao Wu
University of Georgia
Brain-inspired AIArtificial General IntelligenceNLPMedical Image Analysis
C
Chaohu Liu
University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, China
L
Linli Xu
University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, China