AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need

πŸ“… 2025-09-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Climate data science has long suffered from fragmented data sources, heterogeneous formats, and high technical barriers, impeding broad scientific participation, discovery efficiency, and reproducibility. To address these challenges, we propose a β€œone knowledge graph suffices” paradigm, designing a cloud-native AI workflow centered on a curated, domain-specific knowledge graph. This framework integrates generative AI, natural language understanding, and AI agent technologies to enable end-to-end user intent parsing, automated data discovery, retrieval, and analysis. It supports natural-language interaction and community-driven curation and sharing, substantially lowering entry barriers for non-expert users. Experimental evaluation demonstrates significant improvements in research reproducibility, scalability, and human-AI collaboration efficacy. Our work establishes a reusable methodology and practical implementation for scientific AI (SciAI), advancing interoperable, knowledge-grounded automation in climate science.

Technology Category

Application Category

πŸ“ Abstract
Climate data science faces persistent barriers stemming from the fragmented nature of data sources, heterogeneous formats, and the steep technical expertise required to identify, acquire, and process datasets. These challenges limit participation, slow discovery, and reduce the reproducibility of scientific workflows. In this paper, we present a proof of concept for addressing these barriers through the integration of a curated knowledge graph (KG) with AI agents designed for cloud-native scientific workflows. The KG provides a unifying layer that organizes datasets, tools, and workflows, while AI agents -- powered by generative AI services -- enable natural language interaction, automated data access, and streamlined analysis. Together, these components drastically lower the technical threshold for engaging in climate data science, enabling non-specialist users to identify and analyze relevant datasets. By leveraging existing cloud-ready API data portals, we demonstrate that "a knowledge graph is all you need" to unlock scalable and agentic workflows for scientific inquiry. The open-source design of our system further supports community contributions, ensuring that the KG and associated tools can evolve as a shared commons. Our results illustrate a pathway toward democratizing access to climate data and establishing a reproducible, extensible framework for human--AI collaboration in scientific research.
Problem

Research questions and friction points this paper is trying to address.

Overcoming fragmented climate data sources and heterogeneous formats
Reducing technical expertise needed for data identification and processing
Enabling non-specialists to access and analyze climate datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using knowledge graph to unify datasets and workflows
AI agents enable natural language interaction and automation
Leveraging cloud APIs for scalable agentic scientific workflows
πŸ”Ž Similar Papers
No similar papers found.
Ahmed Jaber
Ahmed Jaber
Association of Palestinian Local Authorities
Transportation EngineeringRoad SafetyMicromobilityTravel BehaviourSDGs
W
Wangshu Zhu
NSF STC Learning the Earth with AI and Physics (LEAP), Columbia University, New York, NY , USA
K
Karthick Jayavelu
AWS Generative AI Innovation Center, Seattle, WA, USA
J
Justin Downes
AWS Generative AI Innovation Center, Seattle, WA, USA
S
Sameer Mohamed
AWS Generative AI Innovation Center, Seattle, WA, USA
C
Candace Agonafir
NSF STC Learning the Earth with AI and Physics (LEAP), Columbia University, New York, NY , USA
L
Linnia Hawkins
NSF STC Learning the Earth with AI and Physics (LEAP), Columbia University, New York, NY , USA
T
Tian Zheng
NSF STC Learning the Earth with AI and Physics (LEAP), Columbia University, New York, NY , USA