Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model

📅 2025-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fisheye stereo matching suffers from large depth errors and poor generalization in real-world scenarios due to the scarcity of high-quality annotated data. Method: This paper proposes an iterative omnidirectional stereo matching framework that integrates a pre-trained monocular relative depth foundation model (e.g., DepthAnything). It innovatively embeds relative depth features into cost volume construction and optimization, and introduces a two-stage training strategy: (i) relative depth priors guide initial correspondence estimation; (ii) scale-invariant fine-tuning recovers absolute depth. The method jointly incorporates fisheye distortion modeling, iterative cost volume refinement, and scale-invariant learning. Results: Evaluated on the real-world Helvipad dataset, our approach achieves state-of-the-art performance, reducing disparity MAE by 16% over prior methods. It significantly improves depth estimation accuracy and robustness across varying illumination conditions, object distances, and complex environments.

Technology Category

Application Category

📝 Abstract
Omnidirectional depth perception is essential for mobile robotics applications that require scene understanding across a full 360{deg} field of view. Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps without relying on expensive active sensing. However, existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments, depth ranges, and lighting conditions, due to the scarcity of real-world data. We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation within an iterative optimization-based stereo matching architecture. We introduce a dedicated two-stage training strategy to utilize the relative monocular depth features for our omnidirectional stereo matching before scale-invariant fine-tuning. DFI-OmniStereo achieves state-of-the-art results on the real-world Helvipad dataset, reducing disparity MAE by approximately 16% compared to the previous best omnidirectional stereo method.
Problem

Research questions and friction points this paper is trying to address.

Improving omnidirectional stereo matching accuracy in diverse environments
Leveraging pre-trained depth models for better stereo depth estimation
Addressing limited real-world data for omnidirectional depth perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained depth foundation model
Two-stage training strategy
Iterative optimization-based stereo matching
🔎 Similar Papers
No similar papers found.