GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the critical limitation of existing large remote sensing models—their inability to perceive height, a key vertical dimension essential for spatial reasoning in complex scenes. To bridge this gap, we propose GeoHeightChat, the first height-aware multimodal remote sensing understanding framework. Our approach introduces a vision-language model–driven data generation pipeline to construct two novel benchmarks, GeoHeight-Bench and GeoHeight-Bench+, and integrates height awareness through prompt engineering, metadata extraction, and implicit injection of geometric features. This enables scalable height annotation and interactive height-based reasoning. Experimental results demonstrate that GeoHeightChat substantially mitigates the “vertical blind spot” of current models, achieving significant performance gains in relative height analysis and terrain-awareness tasks.

Technology Category

Application Category

📝 Abstract

Current Large Multimodal Models (LMMs) in Earth Observation typically neglect the critical "vertical" dimension, limiting their reasoning capabilities in complex remote sensing geometries and disaster scenarios where physical spatial structures often outweigh planar visual textures. To bridge this gap, we introduce a comprehensive evaluation framework dedicated to height-aware remote sensing understanding. First, to overcome the severe scarcity of annotated data, we develop a scalable, VLM-driven data generation pipeline utilizing systematic prompt engineering and metadata extraction. This pipeline constructs two complementary benchmarks: GeoHeight-Bench for relative height analysis, and a more challenging GeoHeight-Bench+ for holistic, terrain-aware reasoning. Furthermore, to validate the necessity of height perception, we propose GeoHeightChat, the first height-aware remote sensing LMM baseline. Serving as a strong proof of concept, our baseline demonstrates that synergizing visual semantics with implicitly injected height geometric features effectively mitigates the "vertical blind spot", successfully unlocking a new paradigm of interactive height reasoning in existing optical models.

Problem

Research questions and friction points this paper is trying to address.

height-aware reasoning

remote sensing

vertical dimension

multimodal models

spatial structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

height-aware reasoning

multimodal remote sensing

Large Multimodal Models (LMMs)