GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study addresses the challenges faced by geospatial and remote sensing multimodal foundation models, including broad knowledge requirements, strong modality heterogeneity, and fragmented tasks, compounded by the absence of a unified evaluation benchmark and efficient reasoning framework. To this end, we introduce GeoMMBench, the first comprehensive multimodal question-answering benchmark for remote sensing that spans multiple disciplines, sensor types, and tasks. We also propose GeoMMAgent, a domain-tool-augmented multi-agent system that integrates retrieval, perception, and reasoning capabilities. Experimental results demonstrate that GeoMMBench enables thorough evaluation of 36 state-of-the-art large models, revealing significant deficiencies in geoscientific knowledge and complex reasoning. Furthermore, GeoMMAgent substantially outperforms monolithic models, achieving expert-level performance on specialized remote sensing tasks.

Technology Category

Application Category

📝 Abstract

Recent advances in multimodal large language models (MLLMs) have accelerated progress in domain-oriented AI, yet their development in geoscience and remote sensing (RS) remains constrained by distinctive challenges: wide-ranging disciplinary knowledge, heterogeneous sensor modalities, and a fragmented spectrum of tasks. To bridge these gaps, we introduce GeoMMBench, a comprehensive multimodal question-answering benchmark covering diverse RS disciplines, sensors, and tasks, enabling broader and more rigorous evaluation than prior benchmarks. Using GeoMMBench, we assess 36 open-source and proprietary large language models, uncovering systematic deficiencies in domain knowledge, perceptual grounding, and reasoning--capabilities essential for expert-level geospatial interpretation. Beyond evaluation, we propose GeoMMAgent, a multi-agent framework that strategically integrates retrieval, perception, and reasoning through domain-specific RS models and tools. Extensive experimental results demonstrate that GeoMMAgent significantly outperforms standalone LLMs, underscoring the importance of tool-augmented agents for dynamically tackling complex geoscience and RS challenges.

Problem

Research questions and friction points this paper is trying to address.

geoscience

remote sensing

multimodal large language models

domain knowledge

heterogeneous sensor modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

GeoMMBench

GeoMMAgent

multimodal large language models