🤖 AI Summary
Dense semantic and instance-level 3D volumetric reconstruction—i.e., monocular/multi-view 3D panoptic scene completion—remains underexplored in autonomous driving. Method: We propose the first end-to-end differentiable framework for this task, introducing differentiable object shape modeling; designing plug-and-play object modules and a panoptic fusion head that disentangle instance geometry using only sparse occupancy annotations—without LiDAR supervision; and integrating depth estimation, implicit shape representation, instance offset fields, and voxel rendering. Contribution/Results: Our method is compatible with mainstream 3D occupancy models and achieves an 8.2% improvement in Panoptic Quality (PQ) on SemanticKITTI and nuScenes. It also delivers the first real-time, camera-only 3D panoptic scene completion system.
📝 Abstract
Autonomous vehicles need a complete map of their surroundings to plan and act. This has sparked research into the tasks of 3D occupancy prediction, 3D scene completion, and 3D panoptic scene completion, which predict a dense map of the ego vehicle's surroundings as a voxel grid. Scene completion extends occupancy prediction by predicting occluded regions of the voxel grid, and panoptic scene completion further extends this task by also distinguishing object instances within the same class; both aspects are crucial for path planning and decision-making. However, 3D panoptic scene completion is currently underexplored. This work introduces a novel framework for 3D panoptic scene completion that extends existing 3D semantic scene completion models. We propose an Object Module and Panoptic Module that can easily be integrated with 3D occupancy and scene completion methods presented in the literature. Our approach leverages the available annotations in occupancy benchmarks, allowing individual object shapes to be learned as a differentiable problem. The code is available at https://github.com/nicolamarinello/OffsetOcc .