🤖 AI Summary
This work proposes a novel diffusion-based approach for crowd simulation that explicitly incorporates environmental context and multi-level social interactions—two aspects often overlooked in existing methods that primarily focus on social dynamics. Within the diffusion framework, the model introduces a structured module to encode semantic environmental cues such as obstacles, points of interest, and lighting conditions, while simultaneously employing graph neural networks to capture both pairwise and group-level social relationships. Extensive evaluations on multiple benchmark datasets demonstrate that the proposed method outperforms current state-of-the-art approaches, yielding trajectories that are not only more realistic but also more interpretable. These results underscore the critical role of environmental context and hierarchical social modeling in accurately simulating human crowd motion.
📝 Abstract
Modeling realistic pedestrian trajectories requires accounting for both social interactions and environmental context, yet most existing approaches largely emphasize social dynamics. We propose \textbf{EnvSocial-Diff}: a diffusion-based crowd simulation model informed by social physics and augmented with environmental conditioning and individual--group interaction. Our structured environmental conditioning module explicitly encodes obstacles, objects of interest, and lighting levels, providing interpretable signals that capture scene constraints and attractors. In parallel, the individual--group interaction module goes beyond individual-level modeling by capturing both fine-grained interpersonal relations and group-level conformity through a graph-based design. Experiments on multiple benchmark datasets demonstrate that EnvSocial-Diff outperforms the latest state-of-the-art methods, underscoring the importance of explicit environmental conditioning and multi-level social interaction for realistic crowd simulation. Code is here: https://github.com/zqyq/EnvSocial-Diff.