Enhancing Monocular Height Estimation via Sparse LiDAR-Guided Correction

📅 2025-05-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Monocular height estimation (MHE) on ultra-high-resolution remote sensing imagery suffers from structural information scarcity, leading to excessive reliance on shadow cues—causing uncontrolled errors and spatial inconsistency. To address this, we propose a two-stage correction framework: first, interpretability-based attribution analysis uncovers the implicit shadow bias of models trained on synthetic data; second, sparse but globally distributed ICESat-2 LiDAR point clouds are interpolated via random forest to generate a guidance field that enforces spatial consistency in deep learning predictions. This work introduces the first LiDAR-guided weakly supervised spatial correction paradigm. Evaluated on Saint-Omer, Tokyo, and São Paulo, our method reduces mean absolute error (MAE) by 22.8%, 6.9%, and 4.9%, respectively, significantly improving local accuracy and terrain continuity.

Technology Category

Application Category

📝 Abstract

Monocular height estimation (MHE) from very-high-resolution (VHR) remote sensing imagery via deep learning is notoriously challenging due to the lack of sufficient structural information. Conventional digital elevation models (DEMs), typically derived from airborne LiDAR or multi-view stereo, remain costly and geographically limited. Recently, models trained on synthetic data and refined through domain adaptation have shown remarkable performance in MHE, yet it remains unclear how these models make predictions or how reliable they truly are. In this paper, we investigate a state-of-the-art MHE model trained purely on synthetic data to explore where the model looks when making height predictions. Through systematic analyses, we find that the model relies heavily on shadow cues, a factor that can lead to overestimation or underestimation of heights when shadows deviate from expected norms. Furthermore, the inherent difficulty of evaluating regression tasks with the human eye underscores additional limitations of purely synthetic training. To address these issues, we propose a novel correction pipeline that integrates sparse, imperfect global LiDAR measurements (ICESat-2) with deep-learning outputs to improve local accuracy and achieve spatially consistent corrections. Our method comprises two stages: pre-processing raw ICESat-2 data, followed by a random forest-based approach to densely refine height estimates. Experiments in three representative urban regions -- Saint-Omer, Tokyo, and Sao Paulo -- reveal substantial error reductions, with mean absolute error (MAE) decreased by 22.8%, 6.9%, and 4.9%, respectively. These findings highlight the critical role of shadow awareness in synthetic data-driven models and demonstrate how fusing imperfect real-world LiDAR data can bolster the robustness of MHE, paving the way for more reliable and scalable 3D mapping solutions.

Problem

Research questions and friction points this paper is trying to address.

Monocular height estimation lacks structural information from VHR imagery

Synthetic data-trained models rely on unreliable shadow cues

Sparse LiDAR data integration improves height estimation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse LiDAR data enhances monocular height estimation

Random forest refines height estimates densely

Shadow awareness improves synthetic data models

🔎 Similar Papers

No similar papers found.

Authors to Follow