Towards LLM Agents for Earth Observation

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI systems exhibit insufficient reliability in Earth observation tasks, hindering critical applications such as environmental monitoring and disaster management. To diagnose this limitation, we introduce UnivEarth—the first LLM-agent benchmark tailored for Earth observation—comprising 140 true/false questions. Evaluation reveals that state-of-the-art LLM agents fail to correctly invoke the Google Earth Engine (GEE) API in over 58% of cases, achieving only 33% accuracy. To address this, we propose a lightweight adaptation method based on synthetically generated data, employing supervised fine-tuning to enhance small-language-models’ (e.g., Llama-3.1-8B) GEE API comprehension and execution capabilities. Our approach achieves accuracy comparable to large models like DeepSeek-R1 on UnivEarth, while substantially reducing deployment cost and computational overhead. This work establishes a new paradigm for developing trustworthy, efficient, and cost-effective Earth observation AI agents and releases the benchmark and methods as open-source resources.

Technology Category

Application Category

📝 Abstract
Earth Observation (EO) provides critical planetary data for environmental monitoring, disaster management, climate science, and other scientific domains. Here we ask: Are AI systems ready for reliable Earth Observation? We introduce datasetnamenospace, a benchmark of 140 yes/no questions from NASA Earth Observatory articles across 13 topics and 17 satellite sensors. Using Google Earth Engine API as a tool, LLM agents can only achieve an accuracy of 33% because the code fails to run over 58% of the time. We improve the failure rate for open models by fine-tuning synthetic data, allowing much smaller models (Llama-3.1-8B) to achieve comparable accuracy to much larger ones (e.g., DeepSeek-R1). Taken together, our findings identify significant challenges to be solved before AI agents can automate earth observation, and suggest paths forward. The project page is available at https://iandrover.github.io/UnivEarth.
Problem

Research questions and friction points this paper is trying to address.

Assessing AI readiness for reliable Earth Observation tasks
Evaluating LLM agents' accuracy in processing EO data questions
Improving failure rates of open models in EO code execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark dataset with 140 yes/no questions
Fine-tuning synthetic data for open models
Smaller models achieve comparable accuracy
🔎 Similar Papers
No similar papers found.