Routing-Free Mixture-of-Experts

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel mixture-of-experts (MoE) architecture that eliminates the need for explicit routing mechanisms commonly found in traditional MoE models. By embedding activation logic directly within each expert and enabling end-to-end continuous gradient flow, experts autonomously determine their own activation without reliance on external routers, Softmax operations, Top-K selection, or hard-coded load-balancing heuristics. The approach introduces a unified, adaptive load-balancing framework that jointly optimizes resource allocation across both experts and tokens, supporting configurable dual-objective balancing. Experimental results demonstrate that the proposed model consistently outperforms existing baselines across multiple benchmarks, exhibiting superior scalability and robustness while removing rigid inductive biases imposed by centralized routing.
📝 Abstract
Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax, Top-K and load balancing, instead encapsulating all activation functionalities within individual experts and directly optimized through continuous gradient flow, enabling each expert to determine its activation entirely on its own. We introduce a unified adaptive load-balancing framework to simultaneously optimize both expert-balancing and token-balancing objectives through a configurable interpolation, allowing flexible and customizable resource allocation. Extensive experiments show that Routing-Free MoE can consistently outperform baselines with better scalability and robustness. We analyze its behavior in detail and offer insights that may facilitate future MoE design ad optimization.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
routing mechanism
inductive bias
load balancing
expert activation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Routing-Free MoE
adaptive load balancing
gradient-based expert activation
expert autonomy
continuous optimization