AutoGEEval: A Multimodal and Automated Framework for Geospatial Code Generation on GEE with Large Language Models

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The lack of standardized, automated evaluation tools for geospatial code generation hinders rigorous assessment of large language models (LLMs) in this domain. Method: We propose AutoGEEval—the first multimodal, unit-level automated evaluation framework tailored for Google Earth Engine (GEE). It introduces AutoGEEval-Bench, a dedicated benchmark comprising 1,325 test cases spanning 26 geoscience data categories. The framework integrates GEE’s Python API, LLMs, multimodal prompt engineering, dynamic execution sandboxing, and fine-grained error classification to enable end-to-end evaluation of natural-language-to-geospatial-code translation. Contribution/Results: AutoGEEval establishes the first unified evaluation protocol for GEE-based code generation. We systematically assess 18 state-of-the-art LLMs, quantifying disparities across accuracy, computational resource consumption, execution efficiency, and error patterns. Both the benchmark and framework are open-sourced, providing a reproducible, extensible evaluation infrastructure for geospatial AI code generation research.

Technology Category

Application Category

📝 Abstract
Geospatial code generation is emerging as a key direction in the integration of artificial intelligence and geoscientific analysis. However, there remains a lack of standardized tools for automatic evaluation in this domain. To address this gap, we propose AutoGEEval, the first multimodal, unit-level automated evaluation framework for geospatial code generation tasks on the Google Earth Engine (GEE) platform powered by large language models (LLMs). Built upon the GEE Python API, AutoGEEval establishes a benchmark suite (AutoGEEval-Bench) comprising 1325 test cases that span 26 GEE data types. The framework integrates both question generation and answer verification components to enable an end-to-end automated evaluation pipeline-from function invocation to execution validation. AutoGEEval supports multidimensional quantitative analysis of model outputs in terms of accuracy, resource consumption, execution efficiency, and error types. We evaluate 18 state-of-the-art LLMs-including general-purpose, reasoning-augmented, code-centric, and geoscience-specialized models-revealing their performance characteristics and potential optimization pathways in GEE code generation. This work provides a unified protocol and foundational resource for the development and assessment of geospatial code generation models, advancing the frontier of automated natural language to domain-specific code translation.
Problem

Research questions and friction points this paper is trying to address.

Lack of standardized tools for geospatial code evaluation
Need for automated multimodal assessment of GEE code generation
Performance benchmarking of LLMs in geospatial coding tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal automated framework for geospatial code
Benchmark suite with 1325 GEE test cases
End-to-end evaluation pipeline with LLMs
🔎 Similar Papers
No similar papers found.
S
Shuyang Hou
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
Z
Zhangxiao Shen
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
Huayi Wu
Huayi Wu
Wuhan University
GISremote sensingcartographyGeomatics
Jianyuan Liang
Jianyuan Liang
Wuhan University
GIS SystemGIServiceSpatial Data MiningGraph RAG
Haoyue Jiao
Haoyue Jiao
Wuhan University
GeoAILarge Language ModelCode Generation
Y
Yaxian Qing
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
X
Xiaopu Zhang
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
X
Xu Li
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
Zhipeng Gui
Zhipeng Gui
Professor of GIScience, Wuhan University
GeoAISpatiotemporal Data AnalysisWeb Service & QoSHigh Performance Computing
Xuefeng Guan
Xuefeng Guan
Professor, Wuhan University
High-performance GeoComputationBig-data AnalyticsSpatial Data Mining
L
Longgang Xiang
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China