Deformable Gaussian Occupancy: Decoupling Rigid and Nonrigid Motion with Factorized Distillation

πŸ“… 2026-05-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing weakly supervised occupancy prediction methods struggle to model non-rigid motion and accurately capture fine-grained deformations and temporal consistency of objects such as humans in dynamic 3D scenes. This work proposes DeGO, a novel framework that explicitly decouples rigid and non-rigid motion in occupancy prediction for the first time. DeGO leverages deformable Gaussian representations to model shape deformations and introduces a factorized 4D knowledge distillation mechanism that jointly optimizes temporal consistency and deformation expressiveness by aligning multi-view temporal features and utilizing a VGGT foundation model. Evaluated on the Occ3D-NuScenes benchmark, DeGO achieves a 10.9% overall performance gain and a 13.5% improvement in human instance occupancy prediction, establishing a new state of the art under weak supervision.
πŸ“ Abstract
Understanding dynamic 3D environments is essential for safe autonomous driving, particularly when reasoning about human-centric, nonrigid agents. However, existing weakly supervised occupancy prediction frameworks predominantly assume rigid-body motion and rely on simple frame-to-frame offsets, limiting their ability to capture fine-grained deformations and maintain temporal coherence. To address this issue, we propose DeGO, a deformable Gaussian occupancy framework that unifies decoupled Gaussian deformation with factorized 4D foundation-model distillation. DeGO disentangles rigid and nonrigid motion, enabling each Gaussian primitive to evolve through both deformation and offset-based updates. In parallel, a factorized 4D distillation strategy transfers cross-camera and cross-frame knowledge from the VGGT foundation model, producing foundation-aligned features that enhance temporal consistency. Experiments on the Occ3D-NuScenes benchmark demonstrate that our method achieves state-of-the-art performance under weak supervision, delivering 13.5% gains on human-centric instances and 10.9% overall improvements. These results highlight the effectiveness of deformation-aware and foundation-guided occupancy modeling for dynamic scene understanding. The code is publicly available: https://github.com/vita-epfl/DeGO
Problem

Research questions and friction points this paper is trying to address.

nonrigid motion
occupancy prediction
temporal coherence
dynamic 3D environments
weak supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deformable Gaussian
Nonrigid Motion
Factorized Distillation
Occupancy Prediction
4D Foundation Model