Aerial Multi-View Stereo via Adaptive Depth Range Inference and Normal Cues

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

In aerial multi-view stereo (MVS) reconstruction, large dynamic depth ranges and texture scarcity impede reliable feature matching and cause significant depth estimation errors. To address these challenges in high-altitude scenarios, this paper proposes an adaptive MVS framework. Our method introduces three key innovations: (1) a novel monocular-depth- and surface-normal-guided adaptive depth range predictor, enabling scene-aware depth hypothesis initialization; (2) a normal-guided cost aggregation and depth refinement module that relaxes the restrictive fixed-depth-interval assumption; and (3) a lightweight architecture integrating cross-attention-based disparity learning, geometric cue fusion, and cascaded optimization. Evaluated on WHU, LuoJia-MVS, and München aerial benchmarks, our approach achieves state-of-the-art accuracy while significantly accelerating inference and reducing computational complexity compared to mainstream methods.

Technology Category

Application Category

📝 Abstract

Three-dimensional digital urban reconstruction from multi-view aerial images is a critical application where deep multi-view stereo (MVS) methods outperform traditional techniques. However, existing methods commonly overlook the key differences between aerial and close-range settings, such as varying depth ranges along epipolar lines and insensitive feature-matching associated with low-detailed aerial images. To address these issues, we propose an Adaptive Depth Range MVS (ADR-MVS), which integrates monocular geometric cues to improve multi-view depth estimation accuracy. The key component of ADR-MVS is the depth range predictor, which generates adaptive range maps from depth and normal estimates using cross-attention discrepancy learning. In the first stage, the range map derived from monocular cues breaks through predefined depth boundaries, improving feature-matching discriminability and mitigating convergence to local optima. In later stages, the inferred range maps are progressively narrowed, ultimately aligning with the cascaded MVS framework for precise depth regression. Moreover, a normal-guided cost aggregation operation is specially devised for aerial stereo images to improve geometric awareness within the cost volume. Finally, we introduce a normal-guided depth refinement module that surpasses existing RGB-guided techniques. Experimental results demonstrate that ADR-MVS achieves state-of-the-art performance on the WHU, LuoJia-MVS, and M""unchen datasets, while exhibits superior computational complexity.

Problem

Research questions and friction points this paper is trying to address.

Improving multi-view depth estimation accuracy in aerial images

Addressing varying depth ranges in aerial stereo settings

Enhancing feature-matching discriminability for low-detailed aerial images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive depth range predictor using cross-attention

Normal-guided cost aggregation for aerial images

Normal-guided depth refinement surpassing RGB methods

🔎 Similar Papers

CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement