🤖 AI Summary
This work addresses the challenge of simultaneously ensuring service-level objective (SLO) guarantees for high-priority tasks and achieving high resource utilization on closed-source GPUs, where existing sharing techniques are hindered by the lack of fine-grained scheduling capabilities. The authors propose an SLO-aware GPU scheduling system that, for the first time, enables microsecond-scale preemption on closed-source GPUs. By integrating SLO-driven scheduling policies with idle time-slice reclamation, the system effectively overcomes the traditional trade-off between SLO compliance and throughput inherent in spatial or temporal sharing approaches. Experimental results demonstrate that, compared to state-of-the-art sharing schemes, the proposed method improves SLO attainment for high-priority tasks by 9.7× over spatial sharing and 3.5× over temporal sharing, with SLO violations less than 1% relative to exclusive execution. Moreover, it achieves a 2.4× throughput gain for low-priority tasks while maintaining compatibility across multiple GPU architectures.
📝 Abstract
Existing GPU-sharing techniques, including spatial and temporal sharing, aim to improve utilization but face challenges in simultaneously ensuring SLO adherence and maximizing efficiency due to the lack of fine-grained task scheduling on closed-source GPUs. This paper presents Hummingbird, an SLO-oriented GPU scheduling system that overcomes these challenges by enabling microsecond-scale preemption on closed-source GPUs while effectively harvesting idle GPU time slices. Comprehensive evaluations across diverse GPU architectures reveal that Hummingbird improves the SLO attainment of high-priority tasks by 9.7x and 3.5x compared to the state-of-the-art spatial and temporal-sharing approaches. When compared to executing exclusively, the SLO attainment of the high-priority task, collocating with low-priority tasks on Hummingbird, only drops by less than 1%. Meanwhile, the throughput of the low-priority task outperforms the state-of-the-art temporal-sharing approaches by 2.4x. Hummingbird demonstrates significant effectiveness in ensuring the SLO while enhancing GPU utilization.