Waymo-3DSkelMo: A Multi-Agent 3D Skeletal Motion Dataset for Pedestrian Interaction Modeling in Autonomous Driving

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Existing 3D human motion datasets predominantly rely on monocular RGB videos, suffering from occlusions and temporal discontinuities, resulting in low-fidelity and unrealistic motion sequences. To address this, we introduce Waymo-Motion—the first large-scale, high-quality, temporally coherent 3D skeletal motion dataset tailored for autonomous driving pedestrian interaction modeling. Leveraging LiDAR point clouds from the Waymo Open Dataset, our method uniquely integrates 3D human shape and motion priors to achieve high-fidelity, multi-person skeletal reconstruction with strong temporal consistency. We further introduce explicit interaction semantics and enhance reconstruction accuracy via multi-agent tracking and rigid-body registration. Waymo-Motion covers 800+ real-world urban scenes (14,000+ seconds total), with an average of 27 pedestrians per scene and peak densities up to 250 persons. It establishes a novel 3D pose forecasting benchmark across multiple crowd densities, yielding significant improvements in motion prediction performance.

Technology Category

Application Category

📝 Abstract

Large-scale high-quality 3D motion datasets with multi-person interactions are crucial for data-driven models in autonomous driving to achieve fine-grained pedestrian interaction understanding in dynamic urban environments. However, existing datasets mostly rely on estimating 3D poses from monocular RGB video frames, which suffer from occlusion and lack of temporal continuity, thus resulting in unrealistic and low-quality human motion. In this paper, we introduce Waymo-3DSkelMo, the first large-scale dataset providing high-quality, temporally coherent 3D skeletal motions with explicit interaction semantics, derived from the Waymo Perception dataset. Our key insight is to utilize 3D human body shape and motion priors to enhance the quality of the 3D pose sequences extracted from the raw LiDRA point clouds. The dataset covers over 14,000 seconds across more than 800 real driving scenarios, including rich interactions among an average of 27 agents per scene (with up to 250 agents in the largest scene). Furthermore, we establish 3D pose forecasting benchmarks under varying pedestrian densities, and the results demonstrate its value as a foundational resource for future research on fine-grained human behavior understanding in complex urban environments. The dataset and code will be available at https://github.com/GuangxunZhu/Waymo-3DSkelMo

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale 3D motion datasets for pedestrian interaction modeling

Low-quality human motion due to occlusion and temporal discontinuity

Need for high-quality 3D skeletal motions with interaction semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes 3D body shape and motion priors

Enhances 3D pose sequences from LiDAR

Provides large-scale coherent skeletal motions

🔎 Similar Papers

Recognizing Identities From Human Skeletons: A Survey on 3D Skeleton Based Person Re-Identification