🤖 AI Summary
This work addresses the unsupervised estimation of continuous dynamical system parameters from a single video sequence—applicable to non-purely kinematic systems without relying on frame prediction or manual annotations. We propose a variational autoencoder framework with embedded physical constraints, introducing for the first time a latent-space KL divergence loss in place of pixel-wise reconstruction, thereby avoiding trivial solutions and improving robustness to parameter initialization. Continuous dynamics are modeled via neural ordinary differential equations (neural ODEs), enabling end-to-end differentiable optimization. We introduce Delfys75, the first real-world, multi-system video benchmark dataset for this task. Extensive experiments on both synthetic and real videos demonstrate significant improvements in parameter estimation accuracy, alongside reduced model size and faster training. All code and data are publicly released.
📝 Abstract
Extracting physical dynamical system parameters from recorded observations is key in natural science. Current methods for automatic parameter estimation from video train supervised deep networks on large datasets. Such datasets require labels, which are difficult to acquire. While some unsupervised techniques--which depend on frame prediction--exist, they suffer from long training times, initialization instabilities, only consider motion-based dynamical systems, and are evaluated mainly on synthetic data. In this work, we propose an unsupervised method to estimate the physical parameters of known, continuous governing equations from single videos suitable for different dynamical systems beyond motion and robust to initialization. Moreover, we remove the need for frame prediction by implementing a KL-divergence-based loss function in the latent space, which avoids convergence to trivial solutions and reduces model size and compute. We first evaluate our model on synthetic data, as commonly done. After which, we take the field closer to reality by recording Delfys75: our own real-world dataset of 75 videos for five different types of dynamical systems to evaluate our method and others. Our method compares favorably to others. %, yet, and real-world video datasets and demonstrate improved parameter estimation accuracy compared to existing methods. Code and data are available online:https://github.com/Alejandro-neuro/Learning_physics_from_video.