HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and their Impact on Power and Cooling

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Quantifying the coupled effects of scheduling policies, incentive mechanisms, and physical infrastructure (power/cooling) in HPC systems prior to deployment remains challenging. Method: This paper introduces the first digital twin framework for HPC that natively integrates scheduling functionality—embedding scheduling logic within the twin itself, interfacing with external schedulers via standardized APIs, and incorporating machine learning–based schedulers trained on multi-source, publicly available HPC operational data to enable cross-layer co-simulation. Contribution/Results: It enables the first repeatable, verifiable virtual assessment of how scheduling policies and incentive structures impact energy efficiency and cooling load. The framework has been applied to model multiple real-world HPC systems. Empirical evaluation demonstrates that the proposed ML-driven scheduler significantly outperforms baseline strategies in jointly optimizing resource utilization, energy consumption, and cooling demand—providing a scalable methodological foundation for green, sustainable supercomputing operations.

Technology Category

Application Category

📝 Abstract
Schedulers are critical for optimal resource utilization in high-performance computing. Traditional methods to evaluate schedulers are limited to post-deployment analysis, or simulators, which do not model associated infrastructure. In this work, we present the first-of-its-kind integration of scheduling and digital twins in HPC. This enables what-if studies to understand the impact of parameter configurations and scheduling decisions on the physical assets, even before deployment, or regarching changes not easily realizable in production. We (1) provide the first digital twin framework extended with scheduling capabilities, (2) integrate various top-tier HPC systems given their publicly available datasets, (3) implement extensions to integrate external scheduling simulators. Finally, we show how to (4) implement and evaluate incentive structures, as-well-as (5) evaluate machine learning based scheduling, in such novel digital-twin based meta-framework to prototype scheduling. Our work enables what-if scenarios of HPC systems to evaluate sustainability, and the impact on the simulated system.
Problem

Research questions and friction points this paper is trying to address.

Evaluating HPC scheduling policies and incentive structures pre-deployment
Assessing impact of scheduling decisions on power and cooling infrastructure
Enabling what-if studies for HPC system sustainability through digital twins
Innovation

Methods, ideas, or system contributions that make the work stand out.

Digital twin framework with scheduling capabilities
Integration of top-tier HPC systems datasets
Extensions for external scheduling simulators integration
🔎 Similar Papers
No similar papers found.
Matthias Maiterth
Matthias Maiterth
Oak Ridge National Laboratory
High Performance ComputingParallel ComputingEnergy EfficiencyScalable Performance ToolsComputer Architecture
W
Wesley H. Brewer
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
J
Jaya S. Kuruvella
Texas State University, San Marcos, Texas, USA
Arunavo Dey
Arunavo Dey
Texas State University, San Marcos, Texas, USA
T
Tanzima Z. Islam
Texas State University, San Marcos, Texas, USA
K
Kevin Menear
National Renewable Energy Laboratory, Golden, Colorado, USA
Dmitry Duplyakin
Dmitry Duplyakin
NREL, University of Utah
High Performance ComputingCloud ComputingMachine LearningData Science
R
Rashadul Kabir
Colorado State University, Fort Collins, Colorado, USA
Tapasya Patki
Tapasya Patki
Lawrence Livermore National Laboratory
High Performance Computing
T
Terry Jones
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Feiyi Wang
Feiyi Wang
Distinguished Research Scientist & Group Leader, Analytics and AI Methods at Scale, NCCS/ORNL
HPCAI for Science at Scale