Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of a comprehensive benchmarking framework for multi-agent reinforcement learning (MARL) in urban energy management, which hinders accurate evaluation of algorithmic performance across multidimensional metrics. The authors establish an evaluation framework within the CityLearn environment that incorporates both conventional and novel key performance indicators—including individual building contributions and battery lifespan—and systematically compare mainstream MARL algorithms such as PPO and SAC under decentralized training/decentralized execution (DTDE) and centralized training/decentralized execution (CTDE) paradigms. They further introduce temporal dependency modeling to enhance memory-aware control objectives. Experimental results demonstrate that DTDE consistently outperforms CTDE in both average and worst-case performance; temporal modeling significantly improves ramping smoothness and battery utilization efficiency; and the learned policies exhibit strong robustness to agent or resource dropouts, confirming the practical viability and advantages of decentralized control in real-world deployment.

Technology Category

Application Category

📝 Abstract
The optimization of urban energy systems is crucial for the advancement of sustainable and resilient smart cities, which are becoming increasingly complex with multiple decision-making units. To address scalability and coordination concerns, Multi-Agent Reinforcement Learning (MARL) is a promising solution. This paper addresses the imperative need for comprehensive and reliable benchmarking of MARL algorithms on energy management tasks. CityLearn is used as a case study environment because it realistically simulates urban energy systems, incorporates multiple storage systems, and utilizes renewable energy sources. By doing so, our work sets a new standard for evaluation, conducting a comparative study across multiple key performance indicators (KPIs). This approach illuminates the key strengths and weaknesses of various algorithms, moving beyond traditional KPI averaging which often masks critical insights. Our experiments utilize widely accepted baselines such as Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC), and encompass diverse training schemes including Decentralized Training with Decentralized Execution (DTDE) and Centralized Training with Decentralized Execution (CTDE) approaches and different neural network architectures. Our work also proposes novel KPIs that tackle real world implementation challenges such as individual building contribution and battery storage lifetime. Our findings show that DTDE consistently outperforms CTDE in both average and worst-case performance. Additionally, temporal dependency learning improved control on memory dependent KPIs such as ramping and battery usage, contributing to more sustainable battery operation. Results also reveal robustness to agent or resource removal, highlighting both the resilience and decentralizability of the learned policies.
Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Reinforcement Learning
Energy Management
Benchmarking
Key Performance Indicators
Smart Cities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reinforcement Learning
CityLearn
Key Performance Indicators
Decentralized Training
Battery Lifetime
🔎 Similar Papers
No similar papers found.
A
Aymen Khouja
InstaDeep
I
Imen Jendoubi
InstaDeep
O
Oumayma Mahjoub
InstaDeep
O
Oussama Mahfoudhi
InstaDeep
Claude Formanek
Claude Formanek
University of Cape Town
Multi-Agent Reinforcement LearningReinforcement LearningOffline Reinforcement Learning
Siddarth Singh
Siddarth Singh
Research engineer at Instadeep Ltd
Reinforcement Learning
R
Ruan De Kock
InstaDeep