Aerial Multi-View Stereo via Adaptive Depth Range Inference and Normal Cues

πŸ“… 2025-06-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In aerial multi-view stereo (MVS) reconstruction, large dynamic depth ranges and texture scarcity impede reliable feature matching and cause significant depth estimation errors. To address these challenges in high-altitude scenarios, this paper proposes an adaptive MVS framework. Our method introduces three key innovations: (1) a novel monocular-depth- and surface-normal-guided adaptive depth range predictor, enabling scene-aware depth hypothesis initialization; (2) a normal-guided cost aggregation and depth refinement module that relaxes the restrictive fixed-depth-interval assumption; and (3) a lightweight architecture integrating cross-attention-based disparity learning, geometric cue fusion, and cascaded optimization. Evaluated on WHU, LuoJia-MVS, and MΓΌnchen aerial benchmarks, our approach achieves state-of-the-art accuracy while significantly accelerating inference and reducing computational complexity compared to mainstream methods.

Technology Category

Application Category

πŸ“ Abstract
Three-dimensional digital urban reconstruction from multi-view aerial images is a critical application where deep multi-view stereo (MVS) methods outperform traditional techniques. However, existing methods commonly overlook the key differences between aerial and close-range settings, such as varying depth ranges along epipolar lines and insensitive feature-matching associated with low-detailed aerial images. To address these issues, we propose an Adaptive Depth Range MVS (ADR-MVS), which integrates monocular geometric cues to improve multi-view depth estimation accuracy. The key component of ADR-MVS is the depth range predictor, which generates adaptive range maps from depth and normal estimates using cross-attention discrepancy learning. In the first stage, the range map derived from monocular cues breaks through predefined depth boundaries, improving feature-matching discriminability and mitigating convergence to local optima. In later stages, the inferred range maps are progressively narrowed, ultimately aligning with the cascaded MVS framework for precise depth regression. Moreover, a normal-guided cost aggregation operation is specially devised for aerial stereo images to improve geometric awareness within the cost volume. Finally, we introduce a normal-guided depth refinement module that surpasses existing RGB-guided techniques. Experimental results demonstrate that ADR-MVS achieves state-of-the-art performance on the WHU, LuoJia-MVS, and M""unchen datasets, while exhibits superior computational complexity.
Problem

Research questions and friction points this paper is trying to address.

Improving multi-view depth estimation accuracy in aerial images
Addressing varying depth ranges in aerial stereo settings
Enhancing feature-matching discriminability for low-detailed aerial images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive depth range predictor using cross-attention
Normal-guided cost aggregation for aerial images
Normal-guided depth refinement surpassing RGB methods
πŸ”Ž Similar Papers
Y
Yimei Liu
Department of Information Science and Technology, Ocean University of China
Yakun Ju
Yakun Ju
Assistant Professor, University of Leicester, UK
Computational PhotographyUnderwater VisionImage Processing
Y
Yuan Rao
Department of Information Science and Technology, Ocean University of China
Hao Fan
Hao Fan
Zhejiang A&F University
Recommender System
Junyu Dong
Junyu Dong
Ocean University of China
F
Feng Gao
Department of Information Science and Technology, Ocean University of China
Q
Qian Du
Department of Electrical and Computer Engineering, Mississippi State University