π€ AI Summary
To address the dominant iteration latency caused by sparse and highly imbalanced all-to-all communication in distributed Mixture-of-Experts (MoE) training, this paper proposes RailSβa novel communication framework. First, it leverages the topological symmetry of Rail interconnects to formally prove that uniform send scheduling guarantees uniform receive scheduling. Second, it introduces a decentralized Longest-Processing-Time (LPT) spraying scheduler that transforms global load balancing into coordination-free local decisions. Third, it integrates topology-aware multipath transmission to activate multiple parallel Rails for fine-grained bandwidth aggregation. Evaluated on both realistic (Mixtral) and synthetic MoE workloads, RailS improves effective bus bandwidth by 20%β78%, reduces communication completion time by 17%β78%, and decreases end-to-end iteration time by 18%β40%, approaching theoretically optimal load balance.
π Abstract
Training Mixture-of-Experts (MoE) models introduces sparse and highly imbalanced all-to-all communication that dominates iteration time. Conventional load-balancing methods fail to exploit the deterministic topology of Rail architectures, leaving multi-NIC bandwidth underutilized. We present RailS, a distributed load-balancing framework that minimizes all-to-all completion time in MoE training. RailS leverages the Rail topology's symmetry to prove that uniform sending ensures uniform receiving, transforming global coordination into local scheduling. Each node independently executes a Longest Processing Time First (LPT) spraying scheduler to proactively balance traffic using local information. RailS activates N parallel rails for fine-grained, topology-aware multipath transmission. Across synthetic and real-world MoE workloads, RailS improves bus bandwidth by 20%--78% and reduces completion time by 17%--78%. For Mixtral workloads, it shortens iteration time by 18%--40% and achieves near-optimal load balance, fully exploiting architectural parallelism in distributed training.