GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management

📅 2025-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high eviction rate and prolonged queuing latency of low-priority (LP) jobs in GPU clusters under large language model workloads, this paper proposes an SLO-aware dynamic scheduling framework. Methodologically: (1) it builds a lightweight, tenant-level GPU demand time-series forecasting model; (2) dynamically adjusts LP job reservation quotas based on predicted demand; and (3) introduces a priority-aware, fine-grained preemption strategy to minimize disruption to LP jobs. Deployed across a production cluster with over 10,000 GPUs, the framework reduces LP job eviction by 33.0%, cuts average queuing latency by 44.1%, increases GPU allocation rate by 22.8%, and saves $459,000 monthly. The core contribution lies in the first integrated design unifying tenant-level resource forecasting, elastic quota control, and low-impact preemption—enabling efficient coexistence of high-priority (HP) and LP jobs while jointly guaranteeing their SLOs.

Technology Category

Application Category

📝 Abstract
The surge in large language models (LLMs) has fundamentally reshaped the landscape of GPU usage patterns, creating an urgent need for more efficient management strategies. While cloud providers employ spot instances to reduce costs for low-priority (LP) tasks, existing schedulers still grapple with high eviction rates and lengthy queuing times. To address these limitations, we present GFS, a novel preemptive scheduling framework that enhances service-level objective (SLO) compliance for high-priority (HP) tasks while minimizing preemptions to LP tasks. Firstly, GFS utilizes a lightweight forecasting model that predicts GPU demand among different tenants, enabling proactive resource management. Secondly, GFS employs a dynamic allocation mechanism to adjust the spot quota for LP tasks with guaranteed durations. Lastly, GFS incorporates a preemptive scheduling policy that prioritizes HP tasks while minimizing the impact on LP tasks. We demonstrate the effectiveness of GFS through both real-world implementation and simulations. The results show that GFS reduces eviction rates by 33.0%, and cuts queuing delays by 44.1% for LP tasks. Furthermore, GFS enhances the GPU allocation rate by up to 22.8% in real production clusters. In a production cluster of more than 10,000 GPUs, GFS yields roughly $459,715 in monthly benefits.
Problem

Research questions and friction points this paper is trying to address.

Reducing high eviction rates for low-priority GPU tasks
Minimizing lengthy queuing delays in GPU cluster scheduling
Balancing preemption between high and low priority tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predictive GPU demand forecasting model
Dynamic spot quota allocation mechanism
Preemptive scheduling minimizing LP task impact
🔎 Similar Papers
J
Jiaang Duan
Shanghai Jiao Tong University, Shanghai, China
S
Shenglin Xu
Shanghai Jiao Tong University, Shanghai, China
Shiyou Qian
Shiyou Qian
Shanghai Jiao Tong University
Computer Science
Dingyu Yang
Dingyu Yang
Zhejiang University
DatabasePerformance EvaluationDistributed Processing
K
Kangjin Wang
Alibaba Group, Hangzhou, China
C
Chenzhi Liao
Alibaba Group, Hangzhou, China
Yinghao Yu
Yinghao Yu
Engineer, Alibaba
Resource management in containerized clustersGeneration optimizations for distributed systems
Q
Qin Hua
Shanghai Jiao Tong University, Shanghai, China
H
Hanwen Hu
Shanghai Jiao Tong University, Shanghai, China
Q
Qi Wang
Alibaba Group, Hangzhou, China
W
Wenchao Wu
Alibaba Group, Hangzhou, China
D
Dongqing Bao
Alibaba Group, Hangzhou, China
Tianyu Lu
Tianyu Lu
University of Wisconsin-Madison
Artificial IntelligenceComputational Biology
J
Jian Cao
Shanghai Jiao Tong University, Shanghai, China
Guangtao Xue
Guangtao Xue
Professor of Computer Science, Shanghai Jiao Tong University
Mobile ComputingSocial NetworksWireless Sensor NetworksDistributed Computing
G
Guodong Yang
Alibaba Group, Hangzhou, China
L
Liping Zhang
Alibaba Group, Hangzhou, China
G
Gang Chen
Zhejiang University, Hangzhou, China