PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

📅 2023-09-19
🏛️ arXiv.org
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of 2D semantic and instance annotations under rare viewpoints in autonomous driving—severely limiting perception model generalization—this paper proposes a geometry-semantic co-driven panoramic label generation framework. Our method jointly optimizes a 3D semantic field by fusing coarse-grained 3D geometric priors with noisy 2D semantic cues, enabling the first-ever mutual enhancement between 3D geometry and 2D semantics. A semantic-field denoising mechanism is introduced to suppress 2D annotation noise. Leveraging NeRF-based rendering, hybrid features from hash encoding and MLPs, and pseudo-ground-truth-guided supervision, our approach synthesizes high-fidelity, omnidirectional, multi-view, and spatiotemporally consistent panoramic labels—including appearance, semantic, and instance masks. Evaluated on KITTI-360, our method achieves state-of-the-art performance in cross-view label transfer and significantly improves model generalization across unseen viewpoints.
📝 Abstract
Training perception systems for self-driving cars requires substantial 2D annotations that are labor-intensive to manual label. While existing datasets provide rich annotations on pre-recorded sequences, they fall short in labeling rarely encountered viewpoints, potentially hampering the generalization ability for perception models. In this paper, we present PanopticNeRF-360, a novel approach that combines coarse 3D annotations with noisy 2D semantic cues to generate high-quality panoptic labels and images from any viewpoint. Our key insight lies in exploiting the complementarity of 3D and 2D priors to mutually enhance geometry and semantics. Specifically, we propose to leverage coarse 3D bounding primitives and noisy 2D semantic and instance predictions to guide geometry optimization, by encouraging predicted labels to match panoptic pseudo ground truth. Simultaneously, the improved geometry assists in filtering 3D&2D annotation noise by fusing semantics in 3D space via a learned semantic field. To further enhance appearance, we combine MLP and hash grids to yield hybrid scene features, striking a balance between high-frequency appearance and contiguous semantics. Our experiments demonstrate PanopticNeRF-360's state-of-the-art performance over label transfer methods on the challenging urban scenes of the KITTI-360 dataset. Moreover, PanopticNeRF-360 enables omnidirectional rendering of high-fidelity, multi-view and spatiotemporally consistent appearance, semantic and instance labels. We make our code and data available at https://github.com/fuxiao0719/PanopticNeRF
Problem

Research questions and friction points this paper is trying to address.

Generates panoramic 3D-to-2D labels for urban scenes
Combines 3D and 2D priors to enhance geometry and semantics
Improves label consistency and appearance for self-driving perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines 3D and 2D priors for label transfer
Uses hybrid MLP and hash grids for scene features
Filters noise via learned semantic field fusion
🔎 Similar Papers
No similar papers found.