Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In edge networks, distributed Mixture-of-Experts (MoE) training suffers from resource heterogeneity and dynamic token arrivals, leading to queue buildup, low efficiency, and system instability. To address this, we propose the first online token routing framework grounded in Lyapunov optimization. Our method jointly optimizes token routing decisions and computational frequency allocation without requiring prior knowledge of future system states, thereby ensuring long-term system stability. We introduce two key innovations: (i) a gating consistency constraint to maintain expert selection coherence across time, and (ii) an energy-aware queue management mechanism. Both are integrated into a Lyapunov drift minimization objective for real-time, adaptive control. Extensive experiments on SVHN and CIFAR-100 demonstrate that our framework achieves ≥40% higher system throughput and improves test accuracy by over 5 percentage points compared to conventional routing mechanisms, while significantly enhancing training stability and energy efficiency.

Technology Category

Application Category

📝 Abstract
The sparse activation mechanism of mixture of experts (MoE) model empowers edge intelligence with enhanced training efficiency and reduced computational resource consumption. However, traditional token routing in distributed MoE training faces significant challenges in resource-constrained edge networks characterized by heterogeneous computing capabilities and stochastic token arrivals, which inevitably suffer from workload backlog, resource inefficiency, and performance degradation. To address this issue, we propose a novel Lyapunov-based token routing framework for distributed MoE training over resource-heterogeneous edge networks, termed Stable-MoE. Specifically, we formulate a stochastic optimization problem to maximize both system throughput and gating consistency via optimizing the token routing strategy and computational resource allocation, while ensuring long-term stability of both token and energy queues at the edge devices. Using the Lyapunov optimization, we transform the intractable long-term optimization problem into tractable per-slot subproblems by enabling online decision-making of token routing and computation frequency utilization without the knowledge of future system states. Experimental results on the SVHN and CIFAR-100 datasets demonstrate that Stable-MoE outperforms the baselines with at least 40% and 5% gains in system throughput and test accuracy, respectively.
Problem

Research questions and friction points this paper is trying to address.

Optimizes token routing for distributed MoE training in edge networks
Addresses workload backlog and resource inefficiency from heterogeneous computing
Ensures long-term stability of token and energy queues at devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lyapunov-based token routing for distributed MoE training
Online optimization of routing and resource allocation per time slot
Ensures queue stability and improves throughput and accuracy
L
Long Shi
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
B
Bingyan Ou
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
K
Kang Wei
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
Weihao Zhu
Weihao Zhu
University of Illinois Urbana-Champaign
approximation algorithmsgraph algorithmshardness of approximation
Z
Zhe Wang
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Zhiyong Chen
Zhiyong Chen
Shanghai Jiao Tong University
6G networksWireless CommunicationsComputing and Caching Networks