The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

This work challenges the prevailing assumption that Mixture-of-Experts (MoE) models achieve domain specialization through sparse routing, introducing the COMMITTEEAUDIT framework to systematically analyze expert-level routing behavior. Through quantitative and qualitative evaluation of multiple representative MoE models on the MMLU benchmark, we uncover the existence of persistent “standing committees”—a small subset of experts that consistently dominate routing weights across domains and layers, regardless of routing budget constraints. These core experts anchor structural and syntactic reasoning, while peripheral experts handle only narrow, domain-specific knowledge. Our findings reveal that the actual degree of specialization in MoE models is substantially lower than commonly assumed and suggest that current load-balancing training objectives may conflict with the model’s intrinsic optimization dynamics.

Technology Category

Application Category

📝 Abstract

Mixture of Experts models are widely assumed to achieve domain specialization through sparse routing. In this work, we question this assumption by introducing COMMITTEEAUDIT, a post hoc framework that analyzes routing behavior at the level of expert groups rather than individual experts. Across three representative models and the MMLU benchmark, we uncover a domain-invariant Standing Committee. This is a compact coalition of routed experts that consistently captures the majority of routing mass across domains, layers, and routing budgets, even when architectures already include shared experts. Qualitative analysis further shows that Standing Committees anchor reasoning structure and syntax, while peripheral experts handle domain-specific knowledge. These findings reveal a strong structural bias toward centralized computation, suggesting that specialization in Mixture of Experts models is far less pervasive than commonly believed. This inherent bias also indicates that current training objectives, such as load-balancing losses that enforce uniform expert utilization, may be working against the model's natural optimization path, thereby limiting training efficiency and performance.

Problem

Research questions and friction points this paper is trying to address.

Mixture of Experts

domain specialization

sparse routing

expert utilization

structural bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts

domain-invariant

Standing Committee