🤖 AI Summary
Analyzing and forecasting crowd dynamics in real-world low-quality videos remains challenging due to poor data fidelity, highly complex spatiotemporal variability, and the lack of physical interpretability in existing approaches. Method: This paper proposes a continuous-time neural stochastic differential equation (Neural SDE) framework grounded in physical priors of active matter—modeling crowds as self-propelled agents subject to stochastic forces. By unifying mechanistic physics with data-driven learning, the framework enables interpretable, continuous-time modeling of dense crowd dynamics. It supports weakly supervised learning, overcoming limitations of conventional discrete, black-box models. Contribution/Results: Evaluated on multiple high-density crowd datasets, our method significantly outperforms state-of-the-art approaches, achieving high-fidelity trajectory prediction, counterfactual simulation, and dynamic attribution analysis—demonstrating both accuracy and physical plausibility.
📝 Abstract
Video-based high-density crowd analysis and prediction has been a long-standing topic in computer vision. It is notoriously difficult due to, but not limited to, the lack of high-quality data and complex crowd dynamics. Consequently, it has been relatively under studied. In this paper, we propose a new approach that aims to learn from in-the-wild videos, often with low quality where it is difficult to track individuals or count heads. The key novelty is a new physics prior to model crowd dynamics. We model high-density crowds as active matter, a continumm with active particles subject to stochastic forces, named 'crowd material'. Our physics model is combined with neural networks, resulting in a neural stochastic differential equation system which can mimic the complex crowd dynamics. Due to the lack of similar research, we adapt a range of existing methods which are close to ours for comparison. Through exhaustive evaluation, we show our model outperforms existing methods in analyzing and forecasting extremely high-density crowds. Furthermore, since our model is a continuous-time physics model, it can be used for simulation and analysis, providing strong interpretability. This is categorically different from most deep learning methods, which are discrete-time models and black-boxes.