Can Image-To-Video Models Simulate Pedestrian Dynamics?

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modeling realistic pedestrian dynamics in crowded public scenes remains challenging for generative models, particularly when leveraging image-to-video (I2V) diffusion transformers (DiTs) without explicit behavioral priors. Method: We propose a trajectory-benchmark-driven keyframe-conditioned generation paradigm, establishing a unified evaluation framework that jointly assesses visual fidelity and trajectory-level dynamics. We introduce quantitative trajectory metrics—including displacement distributions, velocity statistics, and interaction density—to measure dynamic plausibility and temporal consistency. Contribution/Results: Our experiments demonstrate that DiT-based I2V models, despite lacking explicit pedestrian behavior modeling, spontaneously generate videos whose trajectory statistics closely approximate those of real-world pedestrian data across multiple metrics. This work pioneers the use of DiT-based I2V models as implicit pedestrian dynamical simulators and provides a reproducible, trajectory-aware benchmark for evaluating generative models in social behavior modeling.

Technology Category

Application Category

📝 Abstract
Recent high-performing image-to-video (I2V) models based on variants of the diffusion transformer (DiT) have displayed remarkable inherent world-modeling capabilities by virtue of training on large scale video datasets. We investigate whether these models can generate realistic pedestrian movement patterns in crowded public scenes. Our framework conditions I2V models on keyframes extracted from pedestrian trajectory benchmarks, then evaluates their trajectory prediction performance using quantitative measures of pedestrian dynamics.
Problem

Research questions and friction points this paper is trying to address.

Investigating I2V models' ability to simulate pedestrian movement patterns
Evaluating generated trajectory predictions using quantitative dynamics measures
Testing world-modeling capabilities on crowded public scene simulations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion transformer models for video generation
Conditions models on pedestrian trajectory keyframes
Evaluates performance with quantitative dynamics measures
🔎 Similar Papers
No similar papers found.