π€ AI Summary
Climate data science has long suffered from fragmented data sources, heterogeneous formats, and high technical barriers, impeding broad scientific participation, discovery efficiency, and reproducibility. To address these challenges, we propose a βone knowledge graph sufficesβ paradigm, designing a cloud-native AI workflow centered on a curated, domain-specific knowledge graph. This framework integrates generative AI, natural language understanding, and AI agent technologies to enable end-to-end user intent parsing, automated data discovery, retrieval, and analysis. It supports natural-language interaction and community-driven curation and sharing, substantially lowering entry barriers for non-expert users. Experimental evaluation demonstrates significant improvements in research reproducibility, scalability, and human-AI collaboration efficacy. Our work establishes a reusable methodology and practical implementation for scientific AI (SciAI), advancing interoperable, knowledge-grounded automation in climate science.
π Abstract
Climate data science faces persistent barriers stemming from the fragmented nature of data sources, heterogeneous formats, and the steep technical expertise required to identify, acquire, and process datasets. These challenges limit participation, slow discovery, and reduce the reproducibility of scientific workflows. In this paper, we present a proof of concept for addressing these barriers through the integration of a curated knowledge graph (KG) with AI agents designed for cloud-native scientific workflows. The KG provides a unifying layer that organizes datasets, tools, and workflows, while AI agents -- powered by generative AI services -- enable natural language interaction, automated data access, and streamlined analysis. Together, these components drastically lower the technical threshold for engaging in climate data science, enabling non-specialist users to identify and analyze relevant datasets. By leveraging existing cloud-ready API data portals, we demonstrate that "a knowledge graph is all you need" to unlock scalable and agentic workflows for scientific inquiry. The open-source design of our system further supports community contributions, ensuring that the KG and associated tools can evolve as a shared commons. Our results illustrate a pathway toward democratizing access to climate data and establishing a reproducible, extensible framework for human--AI collaboration in scientific research.