🤖 AI Summary
This work addresses the limitations of existing trajectory prediction methods in partially observable environments, where inadequate belief inference and the absence of cognitive-behavioral constraints often yield physically implausible and deployment-challenging predictions. To overcome these issues, the authors propose FEP-Diff, an agent-centric prediction framework grounded in the free energy principle. It employs a dual-branch spatiotemporal encoder to extract ego-motion and social interaction features from local observations, integrates goal-conditioned belief learning and social consistency constraints to enhance cognitive plausibility, and introduces a token-level agent-conditioned residual diffusion generator to improve both diversity and accuracy. Evaluated on five public benchmarks, FEP-Diff significantly outperforms state-of-the-art models, demonstrating superior robustness and predictive performance—particularly under limited observability conditions.
📝 Abstract
Trajectory prediction methods have demonstrated remarkable capabilities in capturing complex motion patterns. However, existing methods rely on global state assumptions, suffer from insufficient belief inference under partial observability, and lack cognitive behavioral constraints in prediction. These limitations severely compromise both deployment feasibility and physical plausibility in real-world settings. In this work, we propose FEP-Diff, an agent-centric trajectory prediction framework grounded in the Free Energy Principle, aimed at achieving cognitively plausible predictions under realistic constraints. Specifically, a dual-branch spatiotemporal encoder extracts ego-motion dynamics and social interaction cues from local observations. Building upon this, a goal-conditioned belief learner infers multimodal latent belief distributions optimized via a free-energy objective, with a social consistency constraint on the local neighborhood graph to promote cognitive alignment among neighboring agents. Finally, a residual diffusion trajectory generator is conditioned on the learned belief representations with token-level proxy conditioning, producing precise and diverse future predictions. Extensive experiments on five public benchmarks demonstrate that FEP-Diff consistently outperforms state-of-the-art methods under restricted observability. Code: https://anonymous.4open.science/r/FEP-Diff-8876.