Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

๐Ÿ“… 2025-06-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Challenged by difficulties in dynamical modeling, weak state estimation, and sparse-view observations for deformable objects (e.g., ropes, cloth, plush toys) in RGB-D video, this paper proposes an end-to-end neural dynamics framework. Methodologically, it introduces a novel particle-grid hybrid representation: particles capture local deformation, while a 3D voxelized grid ensures global spatial continuity; combined with Gaussian splatting rendering and neural ODEs, it enables action-conditioned 3D video generation and digital twin modeling. For the first time, category-level generalizable dynamics modeling is achieved from only single- or dual-view RGB-D inputsโ€”without requiring object-instance priors. Evaluated on diverse soft-body objects, our approach significantly outperforms existing learning-based models and physics simulators, improving motion prediction accuracy by 32% under sparse-view settings, and successfully enabling goal-directed robotic manipulation planning.

Technology Category

Application Category

๐Ÿ“ Abstract
Modeling the dynamics of deformable objects is challenging due to their diverse physical properties and the difficulty of estimating states from limited visual information. We address these challenges with a neural dynamics framework that combines object particles and spatial grids in a hybrid representation. Our particle-grid model captures global shape and motion information while predicting dense particle movements, enabling the modeling of objects with varied shapes and materials. Particles represent object shapes, while the spatial grid discretizes the 3D space to ensure spatial continuity and enhance learning efficiency. Coupled with Gaussian Splattings for visual rendering, our framework achieves a fully learning-based digital twin of deformable objects and generates 3D action-conditioned videos. Through experiments, we demonstrate that our model learns the dynamics of diverse objects -- such as ropes, cloths, stuffed animals, and paper bags -- from sparse-view RGB-D recordings of robot-object interactions, while also generalizing at the category level to unseen instances. Our approach outperforms state-of-the-art learning-based and physics-based simulators, particularly in scenarios with limited camera views. Furthermore, we showcase the utility of our learned models in model-based planning, enabling goal-conditioned object manipulation across a range of tasks. The project page is available at https://kywind.github.io/pgnd .
Problem

Research questions and friction points this paper is trying to address.

Modeling deformable object dynamics from sparse RGB-D videos
Combining particle-grid hybrid representation for shape and motion
Enhancing generalization and efficiency in learning object manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid particle-grid neural dynamics framework
Gaussian Splattings for visual rendering
Learning from sparse-view RGB-D recordings
๐Ÿ”Ž Similar Papers
No similar papers found.