Towards Accurate Single Panoramic 3D Detection: A Semantic Gaussian Centric Approach

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
This work addresses the limitations of discrete grid representations in monocular panoramic 2D-to-3D mapping, which often lead to geometric discontinuities and inefficient feature representation. To overcome these challenges, the authors propose PanoGSDet, a novel framework that introduces, for the first time, a continuous semantic 3D Gaussian representation for panoramic 3D object detection. The method extracts semantic and depth features from equirectangular projections and lifts them into semantically enriched Gaussians in spherical space. An end-to-end trainable pipeline is achieved through Gaussian optimization coupled with a Gaussian-guided 3D detection head. Experiments on the Structured3D dataset demonstrate that PanoGSDet significantly outperforms existing approaches, validating the efficacy and superiority of continuous Gaussian representations in preserving geometric continuity and enhancing 3D feature expressiveness.
📝 Abstract
Three-dimensional object detection in panoramic imagery is crucial for comprehensive scene understanding, yet accurately mapping 2D features to 3D remains a significant challenge. Prevailing methods often project 2D features onto discrete 3D grids, which break geometric continuity and limit representation efficiency. To overcome this limitation, this paper proposes PanoGSDet, a monocular panoramic 3D detection framework built upon continuous semantic 3D Gaussian representations. The proposed framework comprises a panoramic depth estimation component and a semantic Gaussian component. The panoramic depth estimation component extracts the equirectangular semantic and depth features from the monocular panorama input. The semantic Gaussian component includes a semantic Gaussian lifting module that projects spherical features into 3D semantic Gaussians, a semantic Gaussian optimization module that refines these semantic Gaussians, and a Gaussian guided prediction head that generates 3D bounding boxes from optimized Gaussian representations. Extensive experiments on the Structured3D dataset demonstrate that our method significantly outperforms existing methods.
Problem

Research questions and friction points this paper is trying to address.

3D object detection
panoramic imagery
2D-to-3D mapping
geometric continuity
representation efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Gaussian
Monocular Panoramic 3D Detection
Continuous 3D Representation
Gaussian Lifting
Panoramic Depth Estimation