🤖 AI Summary
This work addresses the challenge of estimating continuous material parameters from videos involving multiple interacting objects—a setting where existing methods, often limited to single-object scenarios or discrete material classifications, struggle to generalize. To this end, we propose MOSIV, a novel framework that, for the first time, enables joint estimation of per-object continuous material properties from multi-object interaction videos rich in contact dynamics. MOSIV integrates differentiable physics simulation with a geometry-aware alignment objective and introduces object-level fine-grained supervision, substantially improving optimization stability and estimation accuracy. Evaluated on a newly constructed synthetic benchmark featuring complex multi-object interactions, MOSIV significantly outperforms current approaches in both system identification accuracy and long-horizon simulation fidelity.
📝 Abstract
We introduce the challenging problem of multi-object system identification from videos, for which prior methods are ill-suited due to their focus on single-object scenes or discrete material classification with a fixed set of material prototypes. To address this, we propose MOSIV, a new framework that directly optimizes for continuous, per-object material parameters using a differentiable simulator guided by geometric objectives derived from video. We also present a new synthetic benchmark with contact-rich, multi-object interactions to facilitate evaluation. On this benchmark, MOSIV substantially improves grounding accuracy and long-horizon simulation fidelity over adapted baselines, establishing it as a strong baseline for this new task. Our analysis shows that object-level fine-grained supervision and geometry-aligned objectives are critical for stable optimization in these complex, multi-object settings. The source code and dataset will be released.