JRDB-Pose3D: A Multi-person 3D Human Pose and Shape Estimation Dataset for Robotics

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D human pose datasets are largely confined to single-person or controlled settings, falling short of the perceptual demands faced by robots operating in real-world crowded environments. To address this gap, this work presents a large-scale, robot-centric dataset for multi-human 3D pose and shape estimation, captured via a mobile robotic platform in complex indoor and outdoor dynamic scenes. The dataset provides synchronized multi-view, multi-modal data with temporally consistent SMPL pose and shape parameters, individual tracking IDs, and rich social context annotations—including social grouping and demographic attributes. With an average of 5–10 people per frame and up to 35 individuals in dense scenarios, it explicitly incorporates realistic challenges such as occlusion and truncation, thereby enabling research in 3D human pose estimation, tracking, and behavior understanding, and filling a critical void in fine-grained human perception under real-world crowded conditions.

Technology Category

Application Category

📝 Abstract
Real-world scenes are inherently crowded. Hence, estimating 3D poses of all nearby humans, tracking their movements over time, and understanding their activities within social and environmental contexts are essential for many applications, such as autonomous driving, robot perception, robot navigation, and human-robot interaction. However, most existing 3D human pose estimation datasets primarily focus on single-person scenes or are collected in controlled laboratory environments, which restricts their relevance to real-world applications. To bridge this gap, we introduce JRDB-Pose3D, which captures multi-human indoor and outdoor environments from a mobile robotic platform. JRDB-Pose3D provides rich 3D human pose annotations for such complex and dynamic scenes, including SMPL-based pose annotations with consistent body-shape parameters and track IDs for each individual over time. JRDB-Pose3D contains, on average, 5-10 human poses per frame, with some scenes featuring up to 35 individuals simultaneously. The proposed dataset presents unique challenges, including frequent occlusions, truncated bodies, and out-of-frame body parts, which closely reflect real-world environments. Moreover, JRDB-Pose3D inherits all available annotations from the JRDB dataset, such as 2D pose, information about social grouping, activities, and interactions, full-scene semantic masks with consistent human- and object-level tracking, and detailed annotations for each individual, such as age, gender, and race, making it a holistic dataset for a wide range of downstream perception and human-centric understanding tasks.
Problem

Research questions and friction points this paper is trying to address.

3D human pose estimation
multi-person scenes
real-world environments
robot perception
human-robot interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-person 3D pose estimation
SMPL-based shape modeling
mobile robotic platform
real-world occlusions
holistic human-centric dataset
🔎 Similar Papers
No similar papers found.