π€ AI Summary
Existing weakly supervised occupancy prediction methods struggle to model non-rigid motion and accurately capture fine-grained deformations and temporal consistency of objects such as humans in dynamic 3D scenes. This work proposes DeGO, a novel framework that explicitly decouples rigid and non-rigid motion in occupancy prediction for the first time. DeGO leverages deformable Gaussian representations to model shape deformations and introduces a factorized 4D knowledge distillation mechanism that jointly optimizes temporal consistency and deformation expressiveness by aligning multi-view temporal features and utilizing a VGGT foundation model. Evaluated on the Occ3D-NuScenes benchmark, DeGO achieves a 10.9% overall performance gain and a 13.5% improvement in human instance occupancy prediction, establishing a new state of the art under weak supervision.
π Abstract
Understanding dynamic 3D environments is essential for safe autonomous driving, particularly when reasoning about human-centric, nonrigid agents. However, existing weakly supervised occupancy prediction frameworks predominantly assume rigid-body motion and rely on simple frame-to-frame offsets, limiting their ability to capture fine-grained deformations and maintain temporal coherence. To address this issue, we propose DeGO, a deformable Gaussian occupancy framework that unifies decoupled Gaussian deformation with factorized 4D foundation-model distillation. DeGO disentangles rigid and nonrigid motion, enabling each Gaussian primitive to evolve through both deformation and offset-based updates. In parallel, a factorized 4D distillation strategy transfers cross-camera and cross-frame knowledge from the VGGT foundation model, producing foundation-aligned features that enhance temporal consistency. Experiments on the Occ3D-NuScenes benchmark demonstrate that our method achieves state-of-the-art performance under weak supervision, delivering 13.5% gains on human-centric instances and 10.9% overall improvements. These results highlight the effectiveness of deformation-aware and foundation-guided occupancy modeling for dynamic scene understanding. The code is publicly available: https://github.com/vita-epfl/DeGO