π€ AI Summary
Learning world models of deformable objects from real-world videos requires jointly modeling geometry, appearance, dynamics, interactions, and material behaviorβa challenging endeavor due to high-dimensional deformations and complex physical responses. This work proposes a unified interactive world model that integrates physical and neural representations to infer implicit physical states from visual observations and enables forward rollouts under novel interactions with high-fidelity dynamic rendering. The method combines structured physics simulation with neural networks, introducing neural residuals to compensate for unmodeled effects, represents sparse hand motions as distributed soft actuators, employs spatially varying constitutive experts to capture heterogeneous material responses, and generates 4D neural appearance grounded in physical evolution. Experiments demonstrate that the model significantly outperforms existing approaches on real deformable object videos, supporting novel action rollouts, material editing, and dynamic novel-view synthesis.
π Abstract
World models for deformable objects should recover not only geometry and appearance, but also underlying physical dynamics, interaction grounding, and material behavior. Learning such a model from real videos is challenging because deformable linear, planar, and volumetric objects evolve under high-dimensional deformation, noisy interactions, and complex material response. The model must therefore infer a physical state from visual observations, roll it forward under new interactions, and render the resulting dynamics with high visual fidelity. We present DeformMaster, a video-derived interactive physics--neural world model that turns real interaction videos into an online interactive model of deformable objects within a unified dynamics-and-appearance framework. DeformMaster preserves structured physical rollout while using a neural residual to compensate for unmodeled effects, grounds sparse hand motion as distributed compliant actuator for hand--continuum interaction, represents material response with spatially varying constitutive experts, and drives high-fidelity 4D appearance from the predicted physical evolution. Experiments on real-world deformable-object sequences demonstrate DeformMaster's ability to roll out future dynamics and render dynamic appearance, outperforming state-of-the-art baselines while supporting novel action rollout, material-parameter variation, and dynamic novel-view synthesis.