L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts

๐Ÿ“… 2026-01-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of existing Mixture-of-Experts (MoE) models, whose linear routing in high-dimensional raw input spaces suffers from representation mismatch, angular concentration, and sensitivity to feature scalingโ€”leading to weak routing discriminability and unstable expert specialization. To overcome these issues, the authors propose the L2R framework, which jointly introduces a low-rank latent routing space and a Lipschitz-constrained Saturated Inner Product Scoring (SIPS) mechanism. By employing a multi-anchor, parameter-efficient routing strategy, L2R enhances expert specialization while preserving the geometric smoothness of the routing function. The method significantly improves both routing stability and overall model performance, demonstrating consistent gains across large-scale language modeling and ImageNet vision MoE benchmarks.

Technology Category

Application Category

๐Ÿ“ Abstract
Mixture-of-Experts (MoE) models scale neural networks by conditionally activating a small subset of experts, where the router plays a central role in determining expert specialization and overall model performance. However, many modern MoE systems still adopt linear routers in raw high-dimensional representation spaces, where representation mismatch, angular concentration, and scale-sensitive scoring can jointly undermine routing discriminability and stable expert specialization. In this work, we propose Low-rank \&Lipschitz-controlled Routing (L2R), a unified routing framework that reshapes both the routing space and scoring geometry. L2R performs expert assignment in a shared low-rank latent routing space and introduces Saturated Inner-Product Scoring (SIPS) to explicitly control the Lipschitz behavior of routing functions, yielding smoother and more stable routing geometry. In addition, L2R incorporates a parameter-efficient multi-anchor routing mechanism to enhance expert expressiveness. Extensive experiments on a large-scale language MoE model and a vision MoE setting on ImageNet demonstrate that L2R consistently improves routing stability, expert specialization, and overall model performance.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
routing
representation mismatch
angular concentration
scale-sensitive scoring
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank routing
Lipschitz control
Mixture-of-Experts
Saturated Inner-Product Scoring
Multi-anchor routing
๐Ÿ”Ž Similar Papers