🤖 AI Summary
This work addresses the challenge of accurately predicting future 3D pedestrian poses in complex urban environments, where existing autonomous driving systems often fail due to the lack of explicit modeling of surrounding vehicle dynamics. To this end, we propose the first 3D pose forecasting framework that explicitly models multi-agent interactions between pedestrians and vehicles. Our approach integrates a vehicle encoder and a pedestrian–vehicle cross-attention mechanism into an extended TBIFormer architecture, conditioning pose predictions on dynamic vehicle states. We further introduce Waymo-3DSkelMo+, an enhanced dataset aligned with 3D vehicle bounding boxes, and devise a scene sampling strategy to accommodate varying interaction complexities. Experimental results demonstrate that our method significantly improves prediction accuracy, underscoring the critical role of incorporating vehicle dynamics in enhancing the safety of autonomous driving systems.
📝 Abstract
Accurately predicting pedestrian motion is crucial for safe and reliable autonomous driving in complex urban environments. In this work, we present a 3D vehicle-conditioned pedestrian pose forecasting framework that explicitly incorporates surrounding vehicle information. To support this, we enhance the Waymo-3DSkelMo dataset with aligned 3D vehicle bounding boxes, enabling realistic modeling of multi-agent pedestrian-vehicle interactions. We introduce a sampling scheme to categorize scenes by pedestrian and vehicle count, facilitating training across varying interaction complexities. Our proposed network adapts the TBIFormer architecture with a dedicated vehicle encoder and pedestrian-vehicle interaction cross-attention module to fuse pedestrian and vehicle features, allowing predictions to be conditioned on both historical pedestrian motion and surrounding vehicles. Extensive experiments demonstrate substantial improvements in forecasting accuracy and validate different approaches for modeling pedestrian-vehicle interactions, highlighting the importance of vehicle-aware 3D pose prediction for autonomous driving. Code is available at: https://github.com/GuangxunZhu/VehCondPose3D