🤖 AI Summary
Existing load-balancing mechanisms struggle to effectively exploit path diversity in low-diameter topologies such as Dragonfly and Slim Fly, often relying on proprietary hardware or lacking adaptivity. This work proposes Spritz, a sender-based, general-purpose Ethernet load-balancing framework that, for the first time, enables topology-aware adaptive routing without requiring additional hardware support. Spritz integrates two complementary algorithms—Spritz-Scout and Spritz-Spray—that leverage ECN feedback, packet truncation, and timeout signals for efficient path probing and selection, augmented by a caching mechanism to enhance performance. Large-scale simulations at the thousand-node level demonstrate that Spritz reduces flow completion times by up to 1.8× compared to ECMP and UGAL-L under normal conditions, and achieves up to a 25.4× improvement in the presence of link failures.
📝 Abstract
Low-diameter topologies such as Dragonfly and Slim Fly are increasingly adopted in HPC and datacenter networks, yet existing load balancing techniques either rely on proprietary in-network mechanisms or fail to utilize the full path diversity of these topologies. We introduce Spritz, a flexible sender-based load balancing framework that shifts adaptive topology-aware routing to the endpoints using only standard Ethernet features. We propose two algorithms, Spritz-Scout and Spritz-Spray that, respectively, explore and adaptively cache efficient paths using ECN, packet trimming, and timeout feedback. Through simulation on Dragonfly and Slim Fly topologies with over 1000 endpoints, Spritz outperforms ECMP, UGAL-L, and prior sender-based approaches by up to 1.8x in flow completion time under AI training and datacenter workloads, while offering robust failover with performance improvements of up to 25.4x under link failures, all without additional hardware support. Spritz enables datacenter-scale, commodity Ethernet networks to efficiently leverage low-diameter topologies, offering unified routing and load balancing for the Ultra Ethernet era.