RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address cross-face scale inconsistency in 360° omnidirectional depth estimation—caused by the scarcity of large-scale annotated data—this paper proposes a training-free, robust method. First, the spherical image is mapped to a six-faced cubemap; then, a pre-trained perspective foundation model estimates per-face depth and surface normals. Next, a graph-based optimization framework is introduced, incorporating face-wise learnable scale parameters to achieve global depth scale alignment while preserving 3D geometric consistency—without end-to-end training. Evaluated on Matterport3D, Stanford2D3D, and 360Loc, our approach significantly outperforms existing unsupervised and self-supervised methods. Downstream validation further demonstrates its effectiveness: feature matching accuracy improves by 3.2–5.4%, and Structure-from-Motion (SfM) reconstruction AUC@5 increases by 0.2–9.7%.

Technology Category

Application Category

📝 Abstract
The increasing use of 360 images across various domains has emphasized the need for robust depth estimation techniques tailored for omnidirectional images. However, obtaining large-scale labeled datasets for 360 depth estimation remains a significant challenge. In this paper, we propose RPG360, a training-free robust 360 monocular depth estimation method that leverages perspective foundation models and graph optimization. Our approach converts 360 images into six-face cubemap representations, where a perspective foundation model is employed to estimate depth and surface normals. To address depth scale inconsistencies across different faces of the cubemap, we introduce a novel depth scale alignment technique using graph-based optimization, which parameterizes the predicted depth and normal maps while incorporating an additional per-face scale parameter. This optimization ensures depth scale consistency across the six-face cubemap while preserving 3D structural integrity. Furthermore, as foundation models exhibit inherent robustness in zero-shot settings, our method achieves superior performance across diverse datasets, including Matterport3D, Stanford2D3D, and 360Loc. We also demonstrate the versatility of our depth estimation approach by validating its benefits in downstream tasks such as feature matching 3.2 ~ 5.4% and Structure from Motion 0.2 ~ 9.7% in AUC@5.
Problem

Research questions and friction points this paper is trying to address.

Develops robust 360 monocular depth estimation without training data
Aligns inconsistent depth scales across cubemap faces via graph optimization
Leverages perspective foundation models for zero-shot omnidirectional depth prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses perspective foundation models for depth estimation
Applies graph optimization for depth scale alignment
Converts 360 images to cubemap representations