Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Marine lead (Pb) data remain largely embedded in unstructured academic literature, creating data silos that hinder large-scale synthesis; manual extraction is labor-intensive and non-scalable, while general-purpose large language models (LLMs) often produce errors due to insufficient domain knowledge. To address this, this study proposes an expert-guided LLM adaptation framework that collaboratively constructs a domain-specific knowledge tree with marine scientists, decomposing complex extraction tasks into verifiable steps and integrating multi-level validation mechanisms. Without requiring model fine-tuning, the approach achieves high-fidelity information extraction, yielding 3,751 new Pb records from over 230,000 publications—the largest marine Pb database to date—significantly enhancing coverage in under-sampled regions such as the East China Sea and the Southern Ocean, with an accuracy of 92%. An interactive visualization platform accompanies the dataset for community access.
📝 Abstract
Marine lead (Pb) and its isotopes are critical tracers for ocean circulation and anthropogenic pollution, yet in-situ observations remain costly and sparse. While vast historical records exist, they lie buried within the unstructured content of academic papers, creating "data silos" inaccessible to comprehensive analysis. Manual extraction is unscalable, while general-purpose Large Language Models (LLMs) lack the necessary domain-specific knowledge, leading to hallucinations and scientifically invalid outputs. To address this, we introduce an expert-guided adaptation approach that enables LLMs to perform rigorous scientific data extraction without fine-tuning. We operationalize this approach through Compass, an LLM agent framework enhanced by a Knowledge Tree co-designed with marine scientists, which decomposes complex tasks into verifiable steps, guiding the agent's reasoning to ensure scientific validity. Deploying Compass across a corpus of over 230,000 relevant open-access papers, we successfully extract 3,751 previously unincorporated Pb records. This effort establishes the largest integrated marine Pb database to date. Beyond standard metrics, Compass demonstrates superior reliability through multi-layered validation, achieving 92% accuracy as confirmed through expert manual verification. The newly integrated data expand coverage in previously under-sampled regions such as the East China Sea and the Southern Ocean, providing an enriched data foundation for future scientific discoveries. We release an interactive visualization platform to facilitate open scientific access. Our work demonstrates that expert-guided agents can effectively bridge the gap between general-purpose LLMs and high-stakes scientific domains, enabling scalable data discovery in geosciences.
Problem

Research questions and friction points this paper is trying to address.

marine lead
data silos
scientific data extraction
domain-specific knowledge
unstructured academic literature
Innovation

Methods, ideas, or system contributions that make the work stand out.

expert-guided LLM
Knowledge Tree
marine lead data integration
scientific data extraction
LLM agent
🔎 Similar Papers
No similar papers found.
Y
Yiming Liu
School of Information Science and Electronic Engineering, Shanghai Jiao Tong University
Bin Lu
Bin Lu
Shanghai Jiao Tong University
graph neural networkspatiotemporal data miningAI for ScienceGeoAI
Meng Jin
Meng Jin
Shanghai Jiao Tong University
wireless communicationbackscatterRFID
Z
Ziyuan Sang
School of Information Science and Electronic Engineering, Shanghai Jiao Tong University
S
Shuo Jiang
State Key Laboratory of Estuarine and Coastal Research, East China Normal University
Lei Zhou
Lei Zhou
Institute of Oceanography, Shanghai Jiao Tong University
Meteorology & Atmospheric SciencesOceanography
X
Xinbing Wang
School of Information Science and Electronic Engineering, Shanghai Jiao Tong University
C
Chenghu Zhou
Institute of Geographical Science and Natural Resources Research, Chinese Academy of Sciences
J
Jing Zhang
State Key Laboratory of Estuarine and Coastal Research, East China Normal University