🤖 AI Summary
Omnidirectional depth estimation suffers from inherent distortions in panoramic images—particularly vertical stretching—which existing methods inadequately model due to insufficient consideration of projection geometry. This paper proposes a two-stage framework: (1) stereo matching across multi-cylindrical projections of the same panoramic image, followed by (2) cross-view weighted fusion of resulting depth maps, augmented with a novel ring-shaped attention module that explicitly rectifies vertical distortion. We provide the first systematic empirical validation that cylindrical projection significantly outperforms spherical projection for stereo matching in omnidirectional settings. The entire architecture employs only standard CNN components—no custom operators—greatly enhancing feasibility for embedded deployment. On Deep360 and 3D60 benchmarks, our method reduces depth MAE by 18.8% and 19.9%, respectively, achieving state-of-the-art performance in omnidirectional depth estimation.
📝 Abstract
We introduce Multi-Cylindrical Panoramic Depth Estimation (MCPDepth), a two-stage framework for omnidirectional depth estimation via stereo matching between multiple cylindrical panoramas. MCPDepth uses cylindrical panoramas for initial stereo matching and then fuses the resulting depth maps across views. A circular attention module is employed to overcome the distortion along the vertical axis. MCPDepth exclusively utilizes standard network components, simplifying deployment to embedded devices and outperforming previous methods that require custom kernels. We theoretically and experimentally compare spherical and cylindrical projections for stereo matching, highlighting the advantages of the cylindrical projection. MCPDepth achieves state-of-the-art performance with an 18.8% reduction in mean absolute error (MAE) for depth on the outdoor synthetic dataset Deep360 and a 19.9% reduction on the indoor real-scene dataset 3D60.