π€ AI Summary
This study addresses the challenges faced by geospatial and remote sensing multimodal foundation models, including broad knowledge requirements, strong modality heterogeneity, and fragmented tasks, compounded by the absence of a unified evaluation benchmark and efficient reasoning framework. To this end, we introduce GeoMMBench, the first comprehensive multimodal question-answering benchmark for remote sensing that spans multiple disciplines, sensor types, and tasks. We also propose GeoMMAgent, a domain-tool-augmented multi-agent system that integrates retrieval, perception, and reasoning capabilities. Experimental results demonstrate that GeoMMBench enables thorough evaluation of 36 state-of-the-art large models, revealing significant deficiencies in geoscientific knowledge and complex reasoning. Furthermore, GeoMMAgent substantially outperforms monolithic models, achieving expert-level performance on specialized remote sensing tasks.
π Abstract
Recent advances in multimodal large language models (MLLMs) have accelerated progress in domain-oriented AI, yet their development in geoscience and remote sensing (RS) remains constrained by distinctive challenges: wide-ranging disciplinary knowledge, heterogeneous sensor modalities, and a fragmented spectrum of tasks. To bridge these gaps, we introduce GeoMMBench, a comprehensive multimodal question-answering benchmark covering diverse RS disciplines, sensors, and tasks, enabling broader and more rigorous evaluation than prior benchmarks. Using GeoMMBench, we assess 36 open-source and proprietary large language models, uncovering systematic deficiencies in domain knowledge, perceptual grounding, and reasoning--capabilities essential for expert-level geospatial interpretation. Beyond evaluation, we propose GeoMMAgent, a multi-agent framework that strategically integrates retrieval, perception, and reasoning through domain-specific RS models and tools. Extensive experimental results demonstrate that GeoMMAgent significantly outperforms standalone LLMs, underscoring the importance of tool-augmented agents for dynamically tackling complex geoscience and RS challenges.