π€ AI Summary
In aerial multi-view stereo (MVS) reconstruction, large dynamic depth ranges and texture scarcity impede reliable feature matching and cause significant depth estimation errors. To address these challenges in high-altitude scenarios, this paper proposes an adaptive MVS framework. Our method introduces three key innovations: (1) a novel monocular-depth- and surface-normal-guided adaptive depth range predictor, enabling scene-aware depth hypothesis initialization; (2) a normal-guided cost aggregation and depth refinement module that relaxes the restrictive fixed-depth-interval assumption; and (3) a lightweight architecture integrating cross-attention-based disparity learning, geometric cue fusion, and cascaded optimization. Evaluated on WHU, LuoJia-MVS, and MΓΌnchen aerial benchmarks, our approach achieves state-of-the-art accuracy while significantly accelerating inference and reducing computational complexity compared to mainstream methods.
π Abstract
Three-dimensional digital urban reconstruction from multi-view aerial images is a critical application where deep multi-view stereo (MVS) methods outperform traditional techniques. However, existing methods commonly overlook the key differences between aerial and close-range settings, such as varying depth ranges along epipolar lines and insensitive feature-matching associated with low-detailed aerial images. To address these issues, we propose an Adaptive Depth Range MVS (ADR-MVS), which integrates monocular geometric cues to improve multi-view depth estimation accuracy. The key component of ADR-MVS is the depth range predictor, which generates adaptive range maps from depth and normal estimates using cross-attention discrepancy learning. In the first stage, the range map derived from monocular cues breaks through predefined depth boundaries, improving feature-matching discriminability and mitigating convergence to local optima. In later stages, the inferred range maps are progressively narrowed, ultimately aligning with the cascaded MVS framework for precise depth regression. Moreover, a normal-guided cost aggregation operation is specially devised for aerial stereo images to improve geometric awareness within the cost volume. Finally, we introduce a normal-guided depth refinement module that surpasses existing RGB-guided techniques. Experimental results demonstrate that ADR-MVS achieves state-of-the-art performance on the WHU, LuoJia-MVS, and M""unchen datasets, while exhibits superior computational complexity.