360-Degree Full-view Image Segmentation by Spherical Convolution compatible with Large-scale Planar Pre-trained Models

📅 2025-07-12

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Panoramic image segmentation suffers from limited large-scale annotated datasets, leading to reliance on planar pre-trained models that fail to model spherical distortion and boundary discontinuities. To address this, we propose a spherical convolutional sampling method compatible with standard 2D backbone networks. Our approach features two key innovations: (1) mapping planar pre-trained weights onto a spherical discrete sampling grid to enable distortion-aware feature extraction; and (2) introducing a spherical-feature-guided channel attention mechanism to enhance representation learning for critical regions. Crucially, our method requires no architectural modifications to the backbone and is plug-and-play. Extensive experiments on major indoor panoramic benchmarks—including Stanford2D3D—demonstrate significant improvements in segmentation accuracy, confirming both effectiveness and generalizability across diverse scenes and backbone architectures.

Technology Category

Application Category

📝 Abstract

Due to the current lack of large-scale datasets at the million-scale level, tasks involving panoramic images predominantly rely on existing two-dimensional pre-trained image benchmark models as backbone networks. However, these networks are not equipped to recognize the distortions and discontinuities inherent in panoramic images, which adversely affects their performance in such tasks. In this paper, we introduce a novel spherical sampling method for panoramic images that enables the direct utilization of existing pre-trained models developed for two-dimensional images. Our method employs spherical discrete sampling based on the weights of the pre-trained models, effectively mitigating distortions while achieving favorable initial training values. Additionally, we apply the proposed sampling method to panoramic image segmentation, utilizing features obtained from the spherical model as masks for specific channel attentions, which yields commendable results on commonly used indoor datasets, Stanford2D3D.

Problem

Research questions and friction points this paper is trying to address.

Address distortion in panoramic images using pre-trained 2D models

Enable spherical sampling for compatibility with planar image models

Improve segmentation accuracy on 360-degree indoor datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spherical sampling for pre-trained model compatibility

Distortion mitigation via discrete spherical sampling

Channel attention masks from spherical features

🔎 Similar Papers

Deep Spherical Superpixels