Deformable Gaussian Occupancy: Decoupling Rigid and Nonrigid Motion with Factorized Distillation

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing weakly supervised occupancy prediction methods struggle to model non-rigid motion and accurately capture fine-grained deformations and temporal consistency of objects such as humans in dynamic 3D scenes. This work proposes DeGO, a novel framework that explicitly decouples rigid and non-rigid motion in occupancy prediction for the first time. DeGO leverages deformable Gaussian representations to model shape deformations and introduces a factorized 4D knowledge distillation mechanism that jointly optimizes temporal consistency and deformation expressiveness by aligning multi-view temporal features and utilizing a VGGT foundation model. Evaluated on the Occ3D-NuScenes benchmark, DeGO achieves a 10.9% overall performance gain and a 13.5% improvement in human instance occupancy prediction, establishing a new state of the art under weak supervision.

📝 Abstract

Understanding dynamic 3D environments is essential for safe autonomous driving, particularly when reasoning about human-centric, nonrigid agents. However, existing weakly supervised occupancy prediction frameworks predominantly assume rigid-body motion and rely on simple frame-to-frame offsets, limiting their ability to capture fine-grained deformations and maintain temporal coherence. To address this issue, we propose DeGO, a deformable Gaussian occupancy framework that unifies decoupled Gaussian deformation with factorized 4D foundation-model distillation. DeGO disentangles rigid and nonrigid motion, enabling each Gaussian primitive to evolve through both deformation and offset-based updates. In parallel, a factorized 4D distillation strategy transfers cross-camera and cross-frame knowledge from the VGGT foundation model, producing foundation-aligned features that enhance temporal consistency. Experiments on the Occ3D-NuScenes benchmark demonstrate that our method achieves state-of-the-art performance under weak supervision, delivering 13.5% gains on human-centric instances and 10.9% overall improvements. These results highlight the effectiveness of deformation-aware and foundation-guided occupancy modeling for dynamic scene understanding. The code is publicly available: https://github.com/vita-epfl/DeGO

Problem

Research questions and friction points this paper is trying to address.

nonrigid motion

occupancy prediction

temporal coherence

dynamic 3D environments

weak supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deformable Gaussian

Nonrigid Motion

Factorized Distillation