🤖 AI Summary
This work addresses socially aware pedestrian trajectory prediction by jointly modeling individual intent, social interaction, and intrinsic stochasticity—while enabling interpretable disentanglement of these factors. We propose the “co-vibration” modeling paradigm, framing pedestrian motion as a multi-source vibrational system driven by intent, modulated by social forces, and perturbed by individual noise. A vibration-dynamics-based trajectory encoder is designed to explicitly separate and independently learn the spectral components corresponding to each factor via frequency-domain analysis, followed by linear superposition to reconstruct trajectories. Crucially, we model social interaction as a physical resonance process for the first time, enhancing the transparency and interpretability of social relationship representations through spectral features. Our method achieves significant improvements in average displacement error (ADE) and final displacement error (FDE) on standard benchmarks including ETH and UCY. Qualitative analysis confirms its capability to faithfully capture both social norms and individual randomness.
📝 Abstract
Learning to forecast trajectories of intelligent agents has caught much more attention recently. However, it remains a challenge to accurately account for agents' intentions and social behaviors when forecasting, and in particular, to simulate the unique randomness within each of those components in an explainable and decoupled way. Inspired by vibration systems and their resonance properties, we propose the Resonance (short for Re) model to encode and forecast pedestrian trajectories in the form of ``co-vibrations''. It decomposes trajectory modifications and randomnesses into multiple vibration portions to simulate agents' reactions to each single cause, and forecasts trajectories as the superposition of these independent vibrations separately. Also, benefiting from such vibrations and their spectral properties, representations of social interactions can be learned by emulating the resonance phenomena, further enhancing its explainability. Experiments on multiple datasets have verified its usefulness both quantitatively and qualitatively.