π€ AI Summary
This work addresses the limited generalizability of existing 3D occupancy prediction methods, which rely heavily on precise sensor calibration and in-domain annotations, rendering them ill-suited for unconstrained urban environments. To overcome this, we propose OccAnyβthe first universal 3D occupancy prediction model that operates without requiring sensor calibration and supports diverse multi-view inputs, including monocular, sequential, and surround-view configurations. Our key innovations include a unified 3D occupancy framework, a segmentation-enforced mechanism to enhance occupancy quality and enable mask-level semantic prediction, and a test-time geometric completion strategy based on novel view synthesis. Extensive experiments demonstrate that OccAny consistently outperforms current visual-geometry baselines across all three input settings on two major urban scene datasets, achieving performance on par with in-domain self-supervised approaches.
π Abstract
Relying on in-domain annotations and precise sensor-rig priors, existing 3D occupancy prediction methods are limited in both scalability and out-of-domain generalization. While recent visual geometry foundation models exhibit strong generalization capabilities, they were mainly designed for general purposes and lack one or more key ingredients required for urban occupancy prediction, namely metric prediction, geometry completion in cluttered scenes and adaptation to urban scenarios. We address this gap and present OccAny, the first unconstrained urban 3D occupancy model capable of operating on out-of-domain uncalibrated scenes to predict and complete metric occupancy coupled with segmentation features. OccAny is versatile and can predict occupancy from sequential, monocular, or surround-view images. Our contributions are three-fold: (i) we propose the first generalized 3D occupancy framework with (ii) Segmentation Forcing that improves occupancy quality while enabling mask-level prediction, and (iii) a Novel View Rendering pipeline that infers novel-view geometry to enable test-time view augmentation for geometry completion. Extensive experiments demonstrate that OccAny outperforms all visual geometry baselines on 3D occupancy prediction task, while remaining competitive with in-domain self-supervised methods across three input settings on two established urban occupancy prediction datasets. Our code is available at https://github.com/valeoai/OccAny .