FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing learning-based occupancy prediction methods rely heavily on large-scale 3D annotations and precise camera poses, limiting their generalization. This work proposes the first training-free, open-vocabulary occupancy prediction framework that requires neither 3D labels nor ground-truth poses, constructing globally consistent voxelized occupancy maps from monocular or RGB-D sequences alone. The approach integrates SLAM, geometrically consistent 3D Gaussian mapping, semantic alignment via off-the-shelf vision-language models such as CLIP, and a probabilistic projection from Gaussians to voxels. Evaluated on EmbodiedOcc-ScanNet, it achieves more than a twofold improvement in both IoU and mIoU over supervised and self-supervised baselines, while demonstrating exceptional zero-shot transfer performance on the newly introduced ReplicaOcc benchmark.

📝 Abstract

Existing learning-based occupancy prediction methods rely on large-scale 3D annotations and generalize poorly across environments. We present FreeOcc, a training-free framework for open-vocabulary occupancy prediction from monocular or RGB-D sequences. Unlike prior approaches that require voxel-level supervision and ground-truth camera poses, FreeOcc operates without 3D annotations, pose ground truth, or any learning stage. FreeOcc incrementally builds a globally consistent occupancy map via a four-layer pipeline: a SLAM backbone estimates poses and sparse geometry; a geometrically consistent Gaussian update constructs dense 3D Gaussian maps; open-vocabulary semantics from off-the-shelf vision-language models are associated with Gaussian primitives; and a probabilistic Gaussian-to-occupancy projection produces dense voxel occupancy. Despite being entirely training-free and pose-agnostic, FreeOcc achieves over $2\times$ improvements in IoU and mIoU on EmbodiedOcc-ScanNet compared to prior self-supervised methods. We further introduce ReplicaOcc, a benchmark for indoor open-vocabulary occupancy prediction, and show that FreeOcc transfers zero-shot to novel environments, substantially outperforming both supervised and self-supervised baselines. Project page: https://the-masses.github.io/freeocc-web/.

Problem

Research questions and friction points this paper is trying to address.

occupancy prediction

open-vocabulary

training-free

3D perception

embodied AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free

open-vocabulary occupancy prediction

3D Gaussian mapping