🤖 AI Summary
In edge networks, distributed Mixture-of-Experts (MoE) training suffers from resource heterogeneity and dynamic token arrivals, leading to queue buildup, low efficiency, and system instability. To address this, we propose the first online token routing framework grounded in Lyapunov optimization. Our method jointly optimizes token routing decisions and computational frequency allocation without requiring prior knowledge of future system states, thereby ensuring long-term system stability. We introduce two key innovations: (i) a gating consistency constraint to maintain expert selection coherence across time, and (ii) an energy-aware queue management mechanism. Both are integrated into a Lyapunov drift minimization objective for real-time, adaptive control. Extensive experiments on SVHN and CIFAR-100 demonstrate that our framework achieves ≥40% higher system throughput and improves test accuracy by over 5 percentage points compared to conventional routing mechanisms, while significantly enhancing training stability and energy efficiency.
📝 Abstract
The sparse activation mechanism of mixture of experts (MoE) model empowers edge intelligence with enhanced training efficiency and reduced computational resource consumption. However, traditional token routing in distributed MoE training faces significant challenges in resource-constrained edge networks characterized by heterogeneous computing capabilities and stochastic token arrivals, which inevitably suffer from workload backlog, resource inefficiency, and performance degradation. To address this issue, we propose a novel Lyapunov-based token routing framework for distributed MoE training over resource-heterogeneous edge networks, termed Stable-MoE. Specifically, we formulate a stochastic optimization problem to maximize both system throughput and gating consistency via optimizing the token routing strategy and computational resource allocation, while ensuring long-term stability of both token and energy queues at the edge devices. Using the Lyapunov optimization, we transform the intractable long-term optimization problem into tractable per-slot subproblems by enabling online decision-making of token routing and computation frequency utilization without the knowledge of future system states. Experimental results on the SVHN and CIFAR-100 datasets demonstrate that Stable-MoE outperforms the baselines with at least 40% and 5% gains in system throughput and test accuracy, respectively.