CarbonFlex: Enabling Carbon-aware Provisioning and Scheduling for Cloud Clusters

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses carbon-aware resource provisioning and scheduling for batch jobs in cloud computing clusters. Method: We propose the first cluster-level joint optimization framework that integrates carbon intensity time-series forecasting with reinforcement learning to jointly determine online decisions on resource elasticity (scale-in/scale-out) and job start/stop timing, while respecting job delay tolerance and system elasticity. We introduce a continual learning mechanism that dynamically adapts to real-time carbon intensity variations using historical cluster data, and design an extended architecture based on AWS ParallelCluster with job suspension/resumption capabilities. Contribution/Results: Evaluated on real industrial workloads, our approach reduces carbon emissions by 57% compared to carbon-agnostic baselines and achieves 97.9% of the performance of an ideal oracle, significantly improving the efficiency of green compute scheduling.

Technology Category

Application Category

📝 Abstract
Accelerating computing demand, largely from AI applications, has led to concerns about its carbon footprint. Fortunately, a significant fraction of computing demand comes from batch jobs that are often delay-tolerant and elastic, which enables schedulers to reduce carbon by suspending/resuming jobs and scaling their resources down/up when carbon is high/low. However, prior work on carbon-aware scheduling generally focuses on optimizing carbon for individual jobs in the cloud, and not provisioning and scheduling resources for many parallel jobs in cloud clusters. To address the problem, we present CarbonFlex, a carbon-aware resource provisioning and scheduling approach for cloud clusters. CarbonFlex leverages continuous learning over historical cluster-level data to drive near-optimal runtime resource provisioning and job scheduling. We implement CarbonFlex by extending AWS ParallelCluster to include our carbon-aware provisioning and scheduling algorithms. Our evaluation on publicly available industry workloads shows that CarbonFlex decreases carbon emissions by $sim$57% compared to a carbon-agnostic baseline and performs within 2.1% of an oracle scheduler with perfect knowledge of future carbon intensity and job length.
Problem

Research questions and friction points this paper is trying to address.

Optimizing carbon footprint for cloud cluster batch jobs
Provisioning and scheduling resources for parallel cloud jobs
Reducing emissions via dynamic resource scaling and suspension
Innovation

Methods, ideas, or system contributions that make the work stand out.

Carbon-aware resource provisioning for cloud clusters
Continuous learning from historical cluster data
Extends AWS ParallelCluster with scheduling algorithms
🔎 Similar Papers
No similar papers found.