๐ค AI Summary
Modeling and controlling robotic manipulation of granular media (e.g., beans, rice) remains challenging due to large particle counts, complex inter-particle interactions, and highly variable system states. To address this, we propose the first vision-based differentiable dynamics model grounded in Gaussian Splatting, augmented with physics-informed priors to improve generalization. Our approach jointly integrates explicit 3D scene reconstruction, end-to-end visual dynamics learning, and gradient-based visual model predictive control (Visual MPC), enabling zero-shot cross-environment transfer and complex stacking planning. Leveraging simulation-to-real co-training, our method achieves significantly higher state prediction accuracy and manipulation success rates compared to prior approaches. Notably, it is the first to demonstrate zero-shot generalization to unseen granular manipulation scenesโwithout any task-specific adaptation or real-world fine-tuning.
๐ Abstract
Recent advancements in learned 3D representations have enabled significant progress in solving complex robotic manipulation tasks, particularly for rigid-body objects. However, manipulating granular materials such as beans, nuts, and rice, remains challenging due to the intricate physics of particle interactions, high-dimensional and partially observable state, inability to visually track individual particles in a pile, and the computational demands of accurate dynamics prediction. Current deep latent dynamics models often struggle to generalize in granular material manipulation due to a lack of inductive biases. In this work, we propose a novel approach that learns a visual dynamics model over Gaussian splatting representations of scenes and leverages this model for manipulating granular media via Model-Predictive Control. Our method enables efficient optimization for complex manipulation tasks on piles of granular media. We evaluate our approach in both simulated and real-world settings, demonstrating its ability to solve unseen planning tasks and generalize to new environments in a zero-shot transfer. We also show significant prediction and manipulation performance improvements compared to existing granular media manipulation methods.