π€ AI Summary
High-fidelity multi-gas temperature models (e.g., CICERO-SCM) incur prohibitive computational costs, hindering their integration into reinforcement learning (RL) frameworks for climate policy optimization.
Method: We propose a lightweight, recurrent neural networkβbased climate surrogate model, pre-trained on 20,000 multi-gas emission trajectories from CICERO-SCM.
Contribution/Results: The surrogate achieves high-fidelity global mean temperature prediction (RMSE β 0.0004 K) while accelerating inference by ~1000Γ and converging to the same optimal policy as the original simulator. Integrated into a multi-agent RL framework, it enables real-time temperature response modeling within the environmental loop. End-to-end training efficiency improves by over 100Γ, facilitating scalable, multi-scenario, regional climate policy co-optimization. This significantly advances the frontier of scalable climate-aware intelligent agents.
π Abstract
Climate policy studies require models that capture the combined effects of multiple greenhouse gases on global temperature, but these models are computationally expensive and difficult to embed in reinforcement learning. We present a multi-agent reinforcement learning (MARL) framework that integrates a high-fidelity, highly efficient climate surrogate directly in the environment loop, enabling regional agents to learn climate policies under multi-gas dynamics. As a proof of concept, we introduce a recurrent neural network architecture pretrained on ($20{,}000$) multi-gas emission pathways to surrogate the climate model CICERO-SCM. The surrogate model attains near-simulator accuracy with global-mean temperature RMSE $approx 0.0004 mathrm{K}$ and approximately $1000 imes$ faster one-step inference. When substituted for the original simulator in a climate-policy MARL setting, it accelerates end-to-end training by $>!100 imes$. We show that the surrogate and simulator converge to the same optimal policies and propose a methodology to assess this property in cases where using the simulator is intractable. Our work allows to bypass the core computational bottleneck without sacrificing policy fidelity, enabling large-scale multi-agent experiments across alternative climate-policy regimes with multi-gas dynamics and high-fidelity climate response.