🤖 AI Summary
Climate science urgently requires automated analytical frameworks capable of handling large-scale, heterogeneous data, yet existing general-purpose LLM agents and static scripts lack domain specificity and dynamic collaboration capabilities. To address this, we propose Climate-Agent: the first dynamic multi-agent framework tailored for climate data science. It enables end-to-end automation—from problem understanding and data acquisition to code generation and report synthesis—via API-aware task decomposition, a self-correcting execution loop, and coordinated operation of four specialized agent types (Orchestration, Planning, Data, and Coding). We introduce Climate-Agent-Bench-85, the first real-world benchmark comprising 85 complex climate science tasks. On this benchmark, Climate-Agent achieves a 100% task completion rate and an average report quality score of 8.32, significantly outperforming baseline methods including GitHub Copilot and GPT-5.
📝 Abstract
Climate science demands automated workflows to transform comprehensive questions into data-driven statements across massive, heterogeneous datasets. However, generic LLM agents and static scripting pipelines lack climate-specific context and flexibility, thus, perform poorly in practice. We present ClimateAgent, an autonomous multi-agent framework that orchestrates end-to-end climate data analytic workflows. ClimateAgent decomposes user questions into executable sub-tasks coordinated by an Orchestrate-Agent and a Plan-Agent; acquires data via specialized Data-Agents that dynamically introspect APIs to synthesize robust download scripts; and completes analysis and reporting with a Coding-Agent that generates Python code, visualizations, and a final report with a built-in self-correction loop. To enable systematic evaluation, we introduce Climate-Agent-Bench-85, a benchmark of 85 real-world tasks spanning atmospheric rivers, drought, extreme precipitation, heat waves, sea surface temperature, and tropical cyclones. On Climate-Agent-Bench-85, ClimateAgent achieves 100% task completion and a report quality score of 8.32, outperforming GitHub-Copilot (6.27) and a GPT-5 baseline (3.26). These results demonstrate that our multi-agent orchestration with dynamic API awareness and self-correcting execution substantially advances reliable, end-to-end automation for climate science analytic tasks.