Cross360: 360° Monocular Depth Estimation via Cross Projections Across Scales.

📅 2026-01-24
🏛️ IEEE Transactions on Image Processing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges in 360° monocular depth estimation, where maintaining both global continuity and local consistency is difficult, and multi-projection fusion often suffers from boundary inconsistencies. To this end, the authors propose a dual-view modeling approach that integrates low-distortion tangent-plane local projections with equirectangular global projections. A Cross Projection Feature Alignment module leverages cross-attention mechanisms to align contextual features between local and global views, while a Progressive Feature Aggregation with Attention module hierarchically fuses multi-scale features to refine depth representations. The proposed method achieves state-of-the-art performance across multiple 360° depth estimation benchmarks, demonstrating particularly strong results in complete panoramic scenes.

Technology Category

Application Category

📝 Abstract
360° depth estimation is a challenging research problem due to the difficulty of finding a representation that both preserves global continuity and avoids distortion in spherical images. Existing methods attempt to leverage complementary information from multiple projections, but struggle with balancing global and local consistency. Their local patch features have limited global perception, and the combined global representation does not address discrepancies in feature extraction at the boundaries between patches. To address these issues, we propose Cross360, a novel cross-attention-based architecture integrating local and global information using less-distorted tangent patches along with equirectangular features. Our Cross Projection Feature Alignment module employs cross-attention to align local tangent projection features with the equirectangular projection's 360° field of view, ensuring each tangent projection patch is aware of the global context. Additionally, our Progressive Feature Aggregation with Attention module refines multi-scaled features progressively, enhancing depth estimation accuracy. Cross360 significantly outperforms existing methods across most benchmark datasets, especially those in which the entire 360° image is available, demonstrating its effectiveness in accurate and globally consistent depth estimation. The code and model are available at https://github.com/huangkun101230/Cross360.
Problem

Research questions and friction points this paper is trying to address.

360° depth estimation
monocular depth estimation
spherical images
global consistency
projection representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-attention
360° depth estimation
Tangent projection
Feature alignment
Multi-scale aggregation
🔎 Similar Papers
No similar papers found.
K
Kun Huang
School of Engineering and Computer Science, Victoria University of Wellington, New Zealand
Fang-Lue Zhang
Fang-Lue Zhang
Senior Lecturer (Associate Professor), Victoria University of Wellington, New Zealand
Computer GraphicsImage and Video ProcessingVR AR MRComputer Vision
N
N. Dodgson
School of Engineering and Computer Science, Victoria University of Wellington, New Zealand