Hummingbird: SLO-Oriented GPU Preemption at Microsecond-scale

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the challenge of simultaneously ensuring service-level objective (SLO) guarantees for high-priority tasks and achieving high resource utilization on closed-source GPUs, where existing sharing techniques are hindered by the lack of fine-grained scheduling capabilities. The authors propose an SLO-aware GPU scheduling system that, for the first time, enables microsecond-scale preemption on closed-source GPUs. By integrating SLO-driven scheduling policies with idle time-slice reclamation, the system effectively overcomes the traditional trade-off between SLO compliance and throughput inherent in spatial or temporal sharing approaches. Experimental results demonstrate that, compared to state-of-the-art sharing schemes, the proposed method improves SLO attainment for high-priority tasks by 9.7× over spatial sharing and 3.5× over temporal sharing, with SLO violations less than 1% relative to exclusive execution. Moreover, it achieves a 2.4× throughput gain for low-priority tasks while maintaining compatibility across multiple GPU architectures.

Technology Category

Application Category

📝 Abstract

Existing GPU-sharing techniques, including spatial and temporal sharing, aim to improve utilization but face challenges in simultaneously ensuring SLO adherence and maximizing efficiency due to the lack of fine-grained task scheduling on closed-source GPUs. This paper presents Hummingbird, an SLO-oriented GPU scheduling system that overcomes these challenges by enabling microsecond-scale preemption on closed-source GPUs while effectively harvesting idle GPU time slices. Comprehensive evaluations across diverse GPU architectures reveal that Hummingbird improves the SLO attainment of high-priority tasks by 9.7x and 3.5x compared to the state-of-the-art spatial and temporal-sharing approaches. When compared to executing exclusively, the SLO attainment of the high-priority task, collocating with low-priority tasks on Hummingbird, only drops by less than 1%. Meanwhile, the throughput of the low-priority task outperforms the state-of-the-art temporal-sharing approaches by 2.4x. Hummingbird demonstrates significant effectiveness in ensuring the SLO while enhancing GPU utilization.

Problem

Research questions and friction points this paper is trying to address.

GPU sharing

SLO adherence

fine-grained scheduling

closed-source GPUs

utilization efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU preemption

SLO-oriented scheduling

microsecond-scale scheduling