🤖 AI Summary
Physics-based simulation typically relies on explicit parameter inputs and incurs high computational costs. Method: This paper proposes an end-to-end framework that implicitly infers system physical properties from short videos, integrating video representation learning, graph neural networks (GNNs), and a differentiable physics simulator. It establishes the first direct mapping from raw video to physical encoding and uncovers a linear physical correspondence between the learned embedding and motion dynamics. Contribution/Results: Without manual parameter specification, the model drives a GNN to generate high-fidelity trajectories. On diverse rigid- and soft-body dynamics tasks, physical parameter estimation error decreases by 42%, trajectory prediction accuracy approaches ground-truth simulation, and inference speed improves by 20×—significantly reducing dependence on prior modeling assumptions and computational resources.
📝 Abstract
Lifelike visualizations in design, cinematography, and gaming rely on precise physics simulations, typically requiring extensive computational resources and detailed physical input. This paper presents a method that can infer a system's physical properties from a short video, eliminating the need for explicit parameter input, provided it is close to the training condition. The learned representation is then used within a Graph Network-based Simulator to emulate the trajectories of physical systems. We demonstrate that the video-derived encodings effectively capture the physical properties of the system and showcase a linear dependence between some of the encodings and the system's motion.