π€ AI Summary
This work addresses the challenges of multimodality and accuracy in pedestrian trajectory prediction within complex environments by proposing a probabilistic model that integrates Social-STGCNN with a conditional variational autoencoder (CVAE). The approach explicitly models the multimodal distribution of future trajectories, yielding diverse and well-calibrated predictions. Evaluated on the ETH/UCY benchmark datasets, the method achieves modest performance gains while demonstrating superior end-point accuracy and enhanced trajectory diversity on real-world, unstructured scene data collected by robots. These results underscore the modelβs strong generalization capability and practical deployment potential in real-world applications.
π Abstract
Accurate pedestrian trajectory prediction is crucial for autonomous systems operating in complex environments, such as modular buses and delivery robots in suburban or semi-structured areas. Social Spatio-Temporal Graph Convolutional Neural Networks (Social-STGCNN) have shown strong performance by modeling social interactions; however, producing diverse and well-calibrated future trajectories remains challenging. In this work, we build on a Social-STGCNN backbone and introduce a Conditional Variational Autoencoder (CVAE)-based probabilistic formulation to explicitly model multimodal future trajectories. We evaluate the method on the ETH and UCY pedestrian trajectory datasets as well as on a real-world pedestrian dataset collected by a mobile robot. Results show moderate gains on public benchmarks, but more consistent endpoint accuracy and improved trajectory diversity across different crowd configurations. Evaluation on robot-collected data further demonstrates the approach's effectiveness beyond curated benchmarks and supports its applicability in practical deployments.