Enhancing Monocular Height Estimation via Sparse LiDAR-Guided Correction

📅 2025-05-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular height estimation (MHE) on ultra-high-resolution remote sensing imagery suffers from structural information scarcity, leading to excessive reliance on shadow cues—causing uncontrolled errors and spatial inconsistency. To address this, we propose a two-stage correction framework: first, interpretability-based attribution analysis uncovers the implicit shadow bias of models trained on synthetic data; second, sparse but globally distributed ICESat-2 LiDAR point clouds are interpolated via random forest to generate a guidance field that enforces spatial consistency in deep learning predictions. This work introduces the first LiDAR-guided weakly supervised spatial correction paradigm. Evaluated on Saint-Omer, Tokyo, and São Paulo, our method reduces mean absolute error (MAE) by 22.8%, 6.9%, and 4.9%, respectively, significantly improving local accuracy and terrain continuity.

Technology Category

Application Category

📝 Abstract
Monocular height estimation (MHE) from very-high-resolution (VHR) remote sensing imagery via deep learning is notoriously challenging due to the lack of sufficient structural information. Conventional digital elevation models (DEMs), typically derived from airborne LiDAR or multi-view stereo, remain costly and geographically limited. Recently, models trained on synthetic data and refined through domain adaptation have shown remarkable performance in MHE, yet it remains unclear how these models make predictions or how reliable they truly are. In this paper, we investigate a state-of-the-art MHE model trained purely on synthetic data to explore where the model looks when making height predictions. Through systematic analyses, we find that the model relies heavily on shadow cues, a factor that can lead to overestimation or underestimation of heights when shadows deviate from expected norms. Furthermore, the inherent difficulty of evaluating regression tasks with the human eye underscores additional limitations of purely synthetic training. To address these issues, we propose a novel correction pipeline that integrates sparse, imperfect global LiDAR measurements (ICESat-2) with deep-learning outputs to improve local accuracy and achieve spatially consistent corrections. Our method comprises two stages: pre-processing raw ICESat-2 data, followed by a random forest-based approach to densely refine height estimates. Experiments in three representative urban regions -- Saint-Omer, Tokyo, and Sao Paulo -- reveal substantial error reductions, with mean absolute error (MAE) decreased by 22.8%, 6.9%, and 4.9%, respectively. These findings highlight the critical role of shadow awareness in synthetic data-driven models and demonstrate how fusing imperfect real-world LiDAR data can bolster the robustness of MHE, paving the way for more reliable and scalable 3D mapping solutions.
Problem

Research questions and friction points this paper is trying to address.

Monocular height estimation lacks structural information from VHR imagery
Synthetic data-trained models rely on unreliable shadow cues
Sparse LiDAR data integration improves height estimation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse LiDAR data enhances monocular height estimation
Random forest refines height estimates densely
Shadow awareness improves synthetic data models
🔎 Similar Papers
No similar papers found.
J
Jian Song
Graduate School of Frontier Sciences, The University of Tokyo, Chiba, 277-8561, Japan; RIKEN Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo, 103-0027, Japan
Hongruixuan Chen
Hongruixuan Chen
The University of Tokyo, RIKEN
Deep LearningComputer VisionGeoAIAI4EOMultimodal Remote Sensing
Naoto Yokoya
Naoto Yokoya
The University of Tokyo, RIKEN
Remote SensingComputer VisionMachine LearningData Fusion