🤖 AI Summary
This work addresses physical trajectory inference in multi-object interaction scenarios. We propose the first object-centric conditional denoising diffusion model for this task. Unlike conventional autoregressive approaches—limited to conditioning solely on initial states—our model supports multi-point constraints, enabling conditioning on arbitrary initial or target states for any object at any time step. To achieve bidirectional and generalizable modeling of physical system evolution, we integrate object-centric representations, equivariant neural networks (invariant under both temporal translation and object permutation), and a multi-step conditional sampling mechanism. Experiments demonstrate strong generalization across varying numbers of dynamic objects and trajectory lengths. The model significantly improves accuracy and controllability in multi-conditioned physical trajectory generation, establishing a new state-of-the-art for object-level physics-aware trajectory modeling.
📝 Abstract
Reasoning about the trajectories of multiple, interacting objects is integral to physical reasoning tasks in machine learning. This involves conditions imposed on the objects at different time steps, for instance initial states or desired goal states. Existing approaches in physical reasoning generally rely on autoregressive modeling, which can only be conditioned on initial states, but not on later states. In fields such as planning for reinforcement learning, similar challenges are being addressed with denoising diffusion models. In this work, we propose an object-centric denoising diffusion model architecture for physical reasoning that is translation equivariant over time, permutation equivariant over objects, and can be conditioned on arbitrary time steps for arbitrary objects. We demonstrate how this model can solve tasks with multiple conditions and examine its performance when changing object numbers and trajectory lengths during inference.