gpu_ext: Extensible OS Policies for GPUs via eBPF

📅 2025-12-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modern GPU systems suffer from inflexible static resource management, hindering efficient adaptation to diverse workloads: user-space runtimes lack cross-tenant visibility and hardware control, while kernel-level modifications introduce security vulnerabilities and maintenance overhead. This paper introduces the first eBPF-based policy runtime for GPUs, abstracting GPU drivers and hardware as a programmable OS subsystem. Our key contributions are: (1) a lightweight device-side eBPF virtual machine enabling safe execution of verified policies within the GPU kernel; and (2) a secure, driver-level hooking mechanism that jointly ensures programmability, fine-grained hardware control, and multi-tenant observability. Evaluated on inference, training, and vector search workloads, our approach achieves up to 4.8× higher throughput and 2× lower tail latency. Policy deployment requires zero application modification and zero driver restarts, with runtime overhead under 3%.

Technology Category

Application Category

📝 Abstract
Performance in modern GPU-centric systems depends increasingly on resource management policies, such as memory placement, scheduling, and observability. However, a one-size-fits-all policy performs poorly across diverse workloads. Existing approaches present a tradeoff: user-space runtimes offer programmability but lack cross-tenant visibility and fine-grained hardware control, while OS kernel modification introduce complexity and safety risks. To address this, we argue that the GPU driver and device layer must serve as an extensible OS policy interface. The emerging eBPF offers a possibility, but naively transplanting host-side eBPF is insufficient: it cannot observe critical device-side events, and directly injecting policy code into GPU kernels affects safety and efficiency. We present gpu_ext, an eBPF-based policy runtime that treats the GPU driver and device as a programmable OS subsystem. gpu_ext extends GPU drivers to expose safe hooks and introduces a device-side eBPF runtime that executes verified policy logic within GPU kernels, enabling coherent, application-transparent policies. Evaluation on realistic workloads, including inference, training, and vector search, shows that gpu_ext improves throughput by up to 4.8x and reduces tail latency by up to 2x with low overhead, without modifying applications or restarting drivers.
Problem

Research questions and friction points this paper is trying to address.

Extensible OS policies for GPU resource management
Overcoming limitations of user-space runtimes and kernel modifications
Enabling safe and efficient device-side policy execution via eBPF
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends GPU drivers with safe hooks for eBPF
Introduces device-side eBPF runtime within GPU kernels
Enables coherent, application-transparent policies without modifications
🔎 Similar Papers
No similar papers found.
Yusheng Zheng
Yusheng Zheng
UC santa cruz
Tong Yu
Tong Yu
Adobe Research
Y
Yiwei Yang
UC Santa Cruz
M
Minghui Jiang
Alibaba Group
X
Xiangyu Gao
University of Washington
Jianchang Su
Jianchang Su
University of Connecticut
Cloud ComputingSystem SecurityEfficient LLM System
Y
Yanpeng Hu
ShanghaiTech University
W
Wenan Mao
Alibaba Group
W
Wei Zhang
University of Connecticut
Dan Williams
Dan Williams
Unknown affiliation
A
Andi Quinn
UC Santa Cruz