Hyper-STTN: Social Group-aware Spatial-Temporal Transformer Network for Human Trajectory Prediction with Hypergraph Reasoning

📅 2024-01-12
🏛️ arXiv.org
📈 Citations: 5
Influential: 1
📄 PDF
🤖 AI Summary
Dense crowd trajectory prediction faces dual challenges: complex pairwise spatiotemporal interactions and heterogeneous group dynamics. To address these, we propose a group-aware multi-scale modeling framework. First, we construct a multi-scale hypergraph to explicitly encode social group associations at varying granularities. Second, we integrate hypergraph spectral convolution—based on random-walk transition probabilities—with a spatiotemporal Transformer to achieve heterogeneous alignment between pairwise individual interactions and collective group coordination. Third, we introduce a multimodal Transformer fusion network to enhance joint intention-trajectory reasoning. Our method achieves significant improvements over state-of-the-art approaches on five mainstream pedestrian datasets, demonstrating both effectiveness and generalizability of multi-scale group-structure modeling for intent recognition and trajectory forecasting in dense scenarios.

Technology Category

Application Category

📝 Abstract
Predicting crowded intents and trajectories is crucial in varouls real-world applications, including service robots and autonomous vehicles. Understanding environmental dynamics is challenging, not only due to the complexities of modeling pair-wise spatial and temporal interactions but also the diverse influence of group-wise interactions. To decode the comprehensive pair-wise and group-wise interactions in crowded scenarios, we introduce Hyper-STTN, a Hypergraph-based Spatial-Temporal Transformer Network for crowd trajectory prediction. In Hyper-STTN, crowded group-wise correlations are constructed using a set of multi-scale hypergraphs with varying group sizes, captured through random-walk robability-based hypergraph spectral convolution. Additionally, a spatial-temporal transformer is adapted to capture pedestrians' pair-wise latent interactions in spatial-temporal dimensions. These heterogeneous group-wise and pair-wise are then fused and aligned though a multimodal transformer network. Hyper-STTN outperformes other state-of-the-art baselines and ablation models on 5 real-world pedestrian motion datasets.
Problem

Research questions and friction points this paper is trying to address.

Modeling complex pairwise spatial-temporal interactions in crowd trajectories
Capturing heterogeneous groupwise dynamics through multiscale hypergraphs
Fusing multimodal features for accurate pedestrian trajectory prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypergraph models groupwise correlations using spectral convolution
Spatial-temporal transformer learns pairwise latent interactions
Multimodal transformer fuses heterogeneous groupwise and pairwise features
🔎 Similar Papers
No similar papers found.