Power-Aware Scheduling for Multi-Center HPC Electricity Cost Optimization

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High electricity costs severely hinder the sustainability of multi-site high-performance computing (HPC) systems. Method: This paper proposes TARDIS, the first scheduler integrating power-aware graph neural network (GNN)–based job power consumption prediction with a spatiotemporal cooperative scheduling framework across multiple HPC centers. TARDIS models dynamic job power profiles via GNNs and jointly optimizes task placement across time (leveraging time-of-use electricity pricing) and space (exploiting geographic price differentials) using time-varying electricity price modeling and multi-objective integer programming. Contribution/Results: Unlike conventional single-site or single-dimensional schedulers, TARDIS achieves substantial cost reduction in trace-driven simulations: up to 18% savings in single-center temporal optimization and 10–20% in multi-center scenarios—while maintaining stable throughput and application performance. The approach enables scalable, cost-efficient, and sustainable HPC operations.

Technology Category

Application Category

📝 Abstract
This paper introduces TARDIS (Temporal Allocation for Resource Distribution using Intelligent Scheduling), a novel power-aware job scheduler for High-Performance Computing (HPC) systems that minimizes electricity costs through both temporal and spatial optimization. Our approach addresses the growing concerns of energy consumption in HPC centers, where electricity expenses constitute a substantial portion of operational costs and have a significant financial impact. TARDIS employs a Graph Neural Network (GNN) to accurately predict individual job power consumption, then uses these predictions to strategically schedule jobs across multiple HPC facilities based on time-varying electricity prices. The system integrates both temporal scheduling, shifting power-intensive workloads to off-peak hours, and spatial scheduling, distributing jobs across geographically dispersed centers with different pricing schemes. We evaluate TARDIS using trace-based simulations from real HPC workloads, demonstrating cost reductions of up to 18% in temporal optimization scenarios and 10 to 20% in multi-site environments compared to state-of-the-art scheduling approaches, while maintaining comparable system performance and job throughput. Our comprehensive evaluation shows that TARDIS effectively addresses limitations in existing power-aware scheduling approaches by combining accurate power prediction with holistic spatial-temporal optimization, providing a scalable solution for sustainable and cost-efficient HPC operations.
Problem

Research questions and friction points this paper is trying to address.

Minimizes electricity costs in HPC systems
Predicts job power consumption using GNN
Optimizes job scheduling across multiple HPC centers
Innovation

Methods, ideas, or system contributions that make the work stand out.

TARDIS scheduler optimizes HPC electricity costs
Uses GNN for job power consumption prediction
Combines temporal and spatial job scheduling
🔎 Similar Papers
No similar papers found.