LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address high communication latency and low resource utilization caused by physical proximity-based scheduling under irregular bandwidth conditions in large-scale heterogeneous GPU clusters, this paper proposes a lightweight globally aware dynamic scheduling framework. Methodologically, it introduces (1) a computation-aware, lightweight Transformer architecture to model GPU topology relationships with dynamic bandwidth awareness; (2) a bidirectional tree search algorithm enabling rapid near-optimal scheduling; and (3) cross-architecture generalizability and linear scalability. Experimental results demonstrate that the framework achieves up to 90% GPU bandwidth utilization across diverse heterogeneous clusters and 80% on a real-world H100 cluster—significantly outperforming both default scheduling and state-of-the-art topology-aware baselines.

Technology Category

Application Category

📝 Abstract

Parallel computing with multiple GPUs has become the dominant paradigm for machine learning tasks, especially those of large language models (LLMs). To reduce the latency incurred by inter-GPU communication, a common practice for parallel tasks has been to allocate GPUs based on their physical proximity. However, this long-standing assumption has notable limitations, particularly in large-scale, heterogeneous GPU clusters where bandwidth distribution among GPUs is irregular. In this paper, we introduce LiteGD, a lightweight and dynamic GPU dispatching system based on global perspectives. To tackle the difficulty of storing massive GPU topology information, LiteGD adopts a computation-aware design that leverages a lightweight Transformer network trained on sampled data. Our customized design for network structure ensures both transferability and scalability. LiteGD also employs a bidirectional tree search approach to find the optimal GPU dispatching in the data generated in the previous step, which can identify near-optimal solutions while reducing search overhead. We implement and evaluate LiteGD in both real and simulated GPU clusters with homogeneous and heterogeneous interconnects, respectively. Experimental results demonstrate that LiteGD consistently achieves high GPU bandwidth efficacy (approximately 90%) across various cluster configurations and 80% in real-world H100 cluster, significantly outperforming conventional default and interconnect topology-aware dispatching methods, particularly in large-scale heterogeneous environments.

Problem

Research questions and friction points this paper is trying to address.

Optimizing GPU allocation in heterogeneous clusters

Reducing inter-GPU communication latency dynamically

Enhancing bandwidth efficiency in large-scale systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Transformer for GPU topology

Bidirectional tree search optimization

Computation-aware scalable design

🔎 Similar Papers

No similar papers found.