Whack-a-Mole: Deterministic Packet Spraying Across Multiple Network Paths

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In large-scale AI/ML training, collective completion time (CCT) is highly sensitive to tail latency and traffic imbalance across network paths. To address this, we propose a deterministic multipath packet spraying algorithm. Our method innovatively integrates a bit-reversal counter with a discrete path allocation model to achieve low-dispersion load distribution of *m* data units across *n* parallel paths, with a theoretical guarantee that the path-selection discrepancy over consecutive packets is bounded by *O*(log *m*). The algorithm supports congestion-feedback–driven dynamic path reallocation and is compatible with erasure-coded transport. Experimental results demonstrate significant reductions in tail latency, improved GPU utilization, and enhanced CCT and effective training time ratio (ETTR), thereby boosting both efficiency and scalability of distributed training.

Technology Category

Application Category

📝 Abstract
We present Whack-a-Mole, a deterministic packet spraying algorithm for distributing packets across multiple network paths with provably tight discrepancy bounds. The algorithm is motivated by large-scale distributed AI/ML training and inference workloads, where collective completion time (CCT) and effective training time ratio (ETTR) are highly sensitive to tail latency and transport imbalance. Whack-a-Mole represents the path profile as a discrete allocation of $m$ selection units across $n$ paths and uses a bit-reversal counter to choose a path for each packet. We prove that the discrepancy between expected and actual packet counts per path is bounded by $O(log m)$ over any contiguous packet sequence. The algorithm responds quickly to congestion feedback by reducing allocations to degraded paths and redistributing load to healthier ones. This combination of deterministic distribution, low per-packet overhead, and compatibility with erasure-coded transport makes Whack-a-Mole an effective building block for multipath transport protocols that aim to minimize CCT and maximize GPU utilization.
Problem

Research questions and friction points this paper is trying to address.

Minimizing tail latency for distributed AI/ML training workloads
Reducing transport imbalance across multiple network paths
Improving collective completion time and GPU utilization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic packet spraying using bit-reversal counter selection
Quick congestion response by reallocating degraded path loads
Low-overhead multipath transport with provable discrepancy bounds
🔎 Similar Papers
No similar papers found.