Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of rigorous geometric theory explaining the expressive power of Mixture-of-Experts (MoE) architectures, particularly how sparse routing enhances model capacity. It establishes the first theoretical connection between MoE and tropical geometry by modeling the Top-k routing mechanism as a k-th elementary symmetric tropical polynomial. This formulation reveals that sparsity inherently induces combinatorial depth, and the paper introduces the notions of effective capacity and combinatorial resilience to characterize MoE’s expressiveness on low-dimensional manifold data. Leveraging hyper-simplex normal fan decomposition and the manifold hypothesis, the study proves that MoE avoids capacity collapse and that its expressive power scales with the combinatorial number C(N, k), thereby providing a rigorous theoretical foundation for the topological advantages of conditional computation.

Technology Category

Application Category

📝 Abstract
While Mixture-of-Experts (MoE) architectures define the state-of-the-art, their theoretical success is often attributed to heuristic efficiency rather than geometric expressivity. In this work, we present the first analysis of MoE through the lens of tropical geometry, establishing that the Top-$k$ routing mechanism is algebraically isomorphic to the $k$-th elementary symmetric tropical polynomial. This isomorphism partitions the input space into the Normal Fan of a Hypersimplex, revealing that \textbf{sparsity is combinatorial depth} which scales geometric capacity by the binomial coefficient $\binom{N}{k}$. Moving beyond ambient bounds, we introduce the concept of \textit{Effective Capacity} under the Manifold Hypothesis. We prove that while dense networks suffer from capacity collapse on low-dimensional data, MoE architectures exhibit \textit{Combinatorial Resilience}, maintaining high expressivity via the transversality of routing cones. In this study, our framework unifies the discrete geometry of the Hypersimplex with the continuous geometry of neural functions, offering a rigorous theoretical justification for the topological supremacy of conditional computation.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
sparsity
tropical geometry
expressivity
combinatorial depth
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts
tropical geometry
combinatorial depth
Effective Capacity
Combinatorial Resilience
🔎 Similar Papers
No similar papers found.
Y
Ye Su
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
H
Huayi Tang
Gaoling School of Artificial Intelligence, Renmin University of China
Zixuan Gong
Zixuan Gong
PhD student, Renmin University of China (RUC)
LLM Theory
Y
Yong Liu
Gaoling School of Artificial Intelligence, Renmin University of China