FreeOcc: Training-free Panoptic Occupancy Prediction via Foundation Models

๐Ÿ“… 2026-03-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes the first training-free framework for panoptic 3D occupancy prediction, addressing the reliance of existing vision-only methods on costly 3D supervision or target-domain training when deployed in novel environments. By integrating a promptable foundation segmentation model with a reconstruction-based foundation model, the approach unsupervisedly recovers both semantic and geometric information from multi-view images to enable instance-aware 3D scene understanding. The method incorporates depth and confidence filtering, temporal consistency, and deterministic voxel refinement. It achieves 16.9 mIoU and 16.5 RayIoU (zero-shot) on Occ3D-nuScenes; with pseudo-label fine-tuning, performance improves to 21.1 RayIoU and 3.9 RayPQ, substantially outperforming existing training-free approaches.

Technology Category

Application Category

๐Ÿ“ Abstract
Semantic and panoptic occupancy prediction for road scene analysis provides a dense 3D representation of the ego vehicle's surroundings. Current camera-only approaches typically rely on costly dense 3D supervision or require training models on data from the target domain, limiting deployment in unseen environments. We propose FreeOcc, a training-free pipeline that leverages pretrained foundation models to recover both semantics and geometry from multi-view images. FreeOcc extracts per-view panoptic priors with a promptable foundation segmentation model and prompt-to-taxonomy rules, and reconstructs metric 3D points with a reconstruction foundation model. Depth- and confidence- aware filtering lifts reliable labels into 3D, which are fused over time and voxelized with a deterministic refinement stack. For panoptic occupancy, instances are recovered by fitting and merging robust current-view 3D box candidates, enabling instance-aware occupancy without any learned 3D model. On Occ3D-nuScenes, FreeOcc achieves 16.9 mIoU and 16.5 RayIoU train-free, on par with state-of-the-art weakly supervised methods. When employed as a pseudo-label generation pipeline for training downstream models, it achieves 21.1 RayIoU, surpassing the previous state-of-the-art weakly supervised baseline. Furthermore, FreeOcc sets new baselines for both train-free and weakly supervised panoptic occupancy prediction, achieving 3.1 RayPQ and 3.9 RayPQ, respectively. These results highlight foundation-model-driven perception as a practical route to training-free 3D scene understanding.
Problem

Research questions and friction points this paper is trying to address.

panoptic occupancy prediction
training-free
3D scene understanding
foundation models
camera-only perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free
foundation models
panoptic occupancy
3D scene understanding
pseudo-label generation
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Andrew Caunes
Logiroad, Nantes, France; LS2N - Ecole Centrale de Nantes, France
T
Thierry Chateau
Logiroad, Nantes, France
Vincent Frรฉmont
Vincent Frรฉmont
Full Professor, Centrale Nantes, ARMEN team @ LS2N Lab, UMR 6004
Autonomous Mobile RoboticsVision for RoboticsMachine LearningMulti-sensor Fusion