Routing Sensitivity Without Controllability: A Diagnostic Study of Fairness in MoE Language Models

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the sensitivity of Mixture-of-Experts (MoE) language models’ routing mechanisms to demographic content and their potential for improving generation fairness. The authors propose Fairness-Aware Routing Evaluation (FARE), a diagnostic framework that systematically assesses the feasibility and limitations of mitigating stereotypes through routing interventions across multiple MoE architectures. Through analyses of routing preferences, expert masking, log-likelihood–based interventions, and multidimensional generation evaluations on models including Mixtral, Qwen, DeepSeek-MoE, and OLMoE, the study reveals that while routing sensitivity to demographic cues is widespread, biases are deeply entangled with core knowledge, hindering effective fairness control. Most models fail to robustly transfer routing preferences, and even successful interventions rarely improve generative fairness—often at the cost of significant performance degradation.
📝 Abstract
Mixture-of-Experts (MoE) language models are universally sensitive to demographic content at the routing level, yet exploiting this sensitivity for fairness control is structurally limited. We introduce Fairness-Aware Routing Equilibrium (FARE), a diagnostic framework designed to probe the limits of routing-level stereotype intervention across diverse MoE architectures. FARE reveals that routing-level preference shifts are either unachievable (Mixtral, Qwen1.5, Qwen3), statistically non-robust (DeepSeekMoE), or accompanied by substantial utility cost (OLMoE, -4.4%p CrowS-Pairs at -6.3%p TQA). Critically, even where log-likelihood preference shifts are robust, they do not transfer to decoded generation: expanded evaluations on both non-null models yield null results across all generation metrics. Group-level expert masking reveals why: bias and core knowledge are deeply entangled within expert groups. These findings indicate that routing sensitivity is necessary but insufficient for stereotype control, and identify specific architectural conditions that can inform the design of more controllable future MoE systems.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
fairness
routing sensitivity
stereotype control
language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts
fairness control
routing sensitivity
stereotype intervention
expert entanglement
🔎 Similar Papers
No similar papers found.