GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing

πŸ“… 2026-04-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenges faced by geospatial and remote sensing multimodal foundation models, including broad knowledge requirements, strong modality heterogeneity, and fragmented tasks, compounded by the absence of a unified evaluation benchmark and efficient reasoning framework. To this end, we introduce GeoMMBench, the first comprehensive multimodal question-answering benchmark for remote sensing that spans multiple disciplines, sensor types, and tasks. We also propose GeoMMAgent, a domain-tool-augmented multi-agent system that integrates retrieval, perception, and reasoning capabilities. Experimental results demonstrate that GeoMMBench enables thorough evaluation of 36 state-of-the-art large models, revealing significant deficiencies in geoscientific knowledge and complex reasoning. Furthermore, GeoMMAgent substantially outperforms monolithic models, achieving expert-level performance on specialized remote sensing tasks.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in multimodal large language models (MLLMs) have accelerated progress in domain-oriented AI, yet their development in geoscience and remote sensing (RS) remains constrained by distinctive challenges: wide-ranging disciplinary knowledge, heterogeneous sensor modalities, and a fragmented spectrum of tasks. To bridge these gaps, we introduce GeoMMBench, a comprehensive multimodal question-answering benchmark covering diverse RS disciplines, sensors, and tasks, enabling broader and more rigorous evaluation than prior benchmarks. Using GeoMMBench, we assess 36 open-source and proprietary large language models, uncovering systematic deficiencies in domain knowledge, perceptual grounding, and reasoning--capabilities essential for expert-level geospatial interpretation. Beyond evaluation, we propose GeoMMAgent, a multi-agent framework that strategically integrates retrieval, perception, and reasoning through domain-specific RS models and tools. Extensive experimental results demonstrate that GeoMMAgent significantly outperforms standalone LLMs, underscoring the importance of tool-augmented agents for dynamically tackling complex geoscience and RS challenges.
Problem

Research questions and friction points this paper is trying to address.

geoscience
remote sensing
multimodal large language models
domain knowledge
heterogeneous sensor modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

GeoMMBench
GeoMMAgent
multimodal large language models
remote sensing
tool-augmented agents
πŸ”Ž Similar Papers
No similar papers found.