🤖 AI Summary
Earth science data are growing rapidly, yet their reusability remains limited. To address this challenge, this work proposes PANGAEA-GPT—a hierarchical multi-agent system featuring a centralized supervisor-worker architecture. The system integrates data-type-aware routing, sandboxed deterministic code execution, and an execution-feedback-driven self-correction mechanism to autonomously orchestrate complex, multi-step analytical workflows. Evaluated in physical oceanography and ecology scenarios, PANGAEA-GPT enables end-to-end data analysis with minimal human intervention, substantially enhancing the discoverability and usability of heterogeneous geoscientific datasets.
📝 Abstract
The rapid accumulation of Earth science data has created a significant scalability challenge; while repositories like PANGAEA host vast collections of datasets, citation metrics indicate that a substantial portion remains underutilized, limiting data reusability. Here we present PANGAEA-GPT, a hierarchical multi-agent framework designed for autonomous data discovery and analysis. Unlike standard Large Language Model (LLM) wrappers, our architecture implements a centralized Supervisor-Worker topology with strict data-type-aware routing, sandboxed deterministic code execution, and self-correction via execution feedback, enabling agents to diagnose and resolve runtime errors. Through use-case scenarios spanning physical oceanography and ecology, we demonstrate the system's capacity to execute complex, multi-step workflows with minimal human intervention. This framework provides a methodology for querying and analyzing heterogeneous repository data through coordinated agent workflows.