A Real-Time Digital Twin for Adaptive Scheduling

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
HPC workloads are becoming increasingly heterogeneous, rendering traditional static heuristic schedulers inadequate for dynamic resource demands. To address this, we propose SchedTwin—the first real-time digital twin system for HPC job scheduling. It continuously ingests runtime event streams to drive high-fidelity discrete-event simulation, enabling rapid online evaluation of “what-if” scenarios across multiple scheduling policies and facilitating goal-driven, closed-loop adaptive scheduling. Deeply integrated with the PBS scheduler, SchedTwin achieves low-overhead (sub-10-second decision latency) and high-accuracy online policy optimization. Experimental evaluation in production environments demonstrates that SchedTwin significantly outperforms mainstream static schedulers—overcoming the longstanding dual bottlenecks of adaptability and timeliness inherent in conventional HPC scheduling approaches.

Technology Category

Application Category

📝 Abstract
High-performance computing (HPC) workloads are becoming increasingly diverse, exhibiting wide variability in job characteristics, yet cluster scheduling has long relied on static, heuristic-based policies. In this work we present SchedTwin, a real-time digital twin designed to adaptively guide scheduling decisions using predictive simulation. SchedTwin periodically ingests runtime events from the physical scheduler, performs rapid what-if evaluations of multiple policies using a high-fidelity discrete-event simulator, and dynamically selects the one satisfying the administrator configured optimization goal. We implement SchedTwin as an open-source software and integrate it with the production PBS scheduler. Preliminary results show that SchedTwin consistently outperforms widely used static scheduling policies, while maintaining low overhead (a few seconds per scheduling cycle). These results demonstrate that real-time digital twins offer a practical and effective path toward adaptive HPC scheduling.
Problem

Research questions and friction points this paper is trying to address.

Adaptive scheduling for diverse HPC workloads
Real-time digital twin guides scheduling decisions
Dynamic policy selection to meet optimization goals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time digital twin for adaptive scheduling decisions
Uses predictive simulation with high-fidelity discrete-event simulator
Dynamically selects policies based on administrator optimization goals
🔎 Similar Papers
No similar papers found.