LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the critical reliance on expert knowledge in remote sensing–based lithological interpretation and the absence of systematic benchmarks for evaluating geological semantic understanding in large models. To bridge this gap, we introduce LithoBench—the first benchmark specifically designed for remote sensing lithological interpretation that integrates multi-level geological semantics and expert assessment. LithoBench comprises 10,000 expert-annotated samples across 12 lithological classes, organized into 4,000 multiple-choice and 6,000 open-ended questions structured along five levels of cognitive complexity. Data validity and evaluation reliability are ensured through an expert-in-the-loop, structured image description, and semi-automated collaborative construction pipeline. Experimental results demonstrate that prevailing large models exhibit significant deficiencies in high-order geological reasoning tasks, thereby validating the necessity and effectiveness of LithoBench.

📝 Abstract

Remote sensing lithology interpretation is fundamental to geological surveys, mineral exploration, and regional geological mapping. Unlike general land-cover recognition, lithology interpretation is a knowledge-intensive task that requires experts to infer rock types from various features, e.g., subtle visual, spectral, textural, geomorphological, and contextual cues, making reliable automated interpretation highly challenging. Geological knowledge-guided large multimodal models offer new opportunities, yet their evaluation remains constrained by the lack of benchmarks that capture lithological annotations, multi-level geological semantics, and expert-informed assessment. Here, we propose LithoBench, a multi-level benchmark for evaluating geological semantic understanding in remote sensing lithology interpretation. LithoBench contains 10,000 expert-annotated interpretation instances across 12 representative lithological categories, including 4,000 multiple-choice and 6,000 open-ended tasks organized into five cognitive levels: Identification and Description, Comparative Analysis, Mechanism Explanation, Practical Application, and Comprehensive Reasoning. We further develop an expert-in-the-loop, knowledge-grounded semi-automated construction pipeline, coupling multi sub-processes, e.g., structured geological image descriptions, to enhance geological validity and evaluation reliability. Experiments with multiple large vision-language models eveal substantial limitations in geological semantic understanding, particularly on higher-order explanation, application, and reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

lithology interpretation

remote sensing

large multimodal models

benchmark

geological semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

lithology interpretation

large multimodal models

geological semantics