🤖 AI Summary
This work addresses the challenges in 360° monocular depth estimation, where maintaining both global continuity and local consistency is difficult, and multi-projection fusion often suffers from boundary inconsistencies. To this end, the authors propose a dual-view modeling approach that integrates low-distortion tangent-plane local projections with equirectangular global projections. A Cross Projection Feature Alignment module leverages cross-attention mechanisms to align contextual features between local and global views, while a Progressive Feature Aggregation with Attention module hierarchically fuses multi-scale features to refine depth representations. The proposed method achieves state-of-the-art performance across multiple 360° depth estimation benchmarks, demonstrating particularly strong results in complete panoramic scenes.
📝 Abstract
360° depth estimation is a challenging research problem due to the difficulty of finding a representation that both preserves global continuity and avoids distortion in spherical images. Existing methods attempt to leverage complementary information from multiple projections, but struggle with balancing global and local consistency. Their local patch features have limited global perception, and the combined global representation does not address discrepancies in feature extraction at the boundaries between patches. To address these issues, we propose Cross360, a novel cross-attention-based architecture integrating local and global information using less-distorted tangent patches along with equirectangular features. Our Cross Projection Feature Alignment module employs cross-attention to align local tangent projection features with the equirectangular projection's 360° field of view, ensuring each tangent projection patch is aware of the global context. Additionally, our Progressive Feature Aggregation with Attention module refines multi-scaled features progressively, enhancing depth estimation accuracy. Cross360 significantly outperforms existing methods across most benchmark datasets, especially those in which the entire 360° image is available, demonstrating its effectiveness in accurate and globally consistent depth estimation. The code and model are available at https://github.com/huangkun101230/Cross360.