BEDLAM2.0: Synthetic Humans and Cameras in Motion

📅 2025-11-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the long-standing challenge of estimating 3D human motion in world coordinates from monocular video—particularly hindered by low accuracy under synchronized human-camera motion and the scarcity of real-world ground-truth annotations. To this end, we introduce BEDLAM 2.0: the first large-scale, high-fidelity synthetic dataset for this task. Its key innovations include diverse human morphologies, clothing, hairstyles, footwear, and complex 3D environments, coupled with photorealistic camera motion trajectories. BEDLAM 2.0 provides pixel-accurate rendered videos, precise SMPL-X body parameters, and ground-truth camera poses. Extensive experiments demonstrate that models trained on BEDLAM 2.0 achieve significantly improved accuracy in world-coordinate 3D pose and motion estimation—reducing average error by 18.7% over the original BEDLAM. This establishes a robust data foundation and performance benchmark for markerless reconstruction in unconstrained real-world scenarios.

Technology Category

Application Category

📝 Abstract
Inferring 3D human motion from video remains a challenging problem with many applications. While traditional methods estimate the human in image coordinates, many applications require human motion to be estimated in world coordinates. This is particularly challenging when there is both human and camera motion. Progress on this topic has been limited by the lack of rich video data with ground truth human and camera movement. We address this with BEDLAM2.0, a new dataset that goes beyond the popular BEDLAM dataset in important ways. In addition to introducing more diverse and realistic cameras and camera motions, BEDLAM2.0 increases diversity and realism of body shape, motions, clothing, hair, and 3D environments. Additionally, it adds shoes, which were missing in BEDLAM. BEDLAM has become a key resource for training 3D human pose and motion regressors today and we show that BEDLAM2.0 is significantly better, particularly for training methods that estimate humans in world coordinates. We compare state-of-the art methods trained on BEDLAM and BEDLAM2.0, and find that BEDLAM2.0 significantly improves accuracy over BEDLAM. For research purposes, we provide the rendered videos, ground truth body parameters, and camera motions. We also provide the 3D assets to which we have rights and links to those from third parties.
Problem

Research questions and friction points this paper is trying to address.

Estimating 3D human motion in world coordinates from video
Addressing challenges when both human and camera are moving
Overcoming limited training data with ground truth human and camera motion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic dataset with realistic human and camera motion
Enhanced diversity in body shapes, clothing, and environments
Provides ground truth data for 3D human pose estimation
🔎 Similar Papers
No similar papers found.
J
Joachim Tesch
Max Planck Institute for Intelligent Systems, Tübingen, Germany
Giorgio Becherini
Giorgio Becherini
Max Planck Institute for Intelligent Systems
P
Prerana Achar
Max Planck Institute for Intelligent Systems, Tübingen, Germany
A
Anastasios Yiannakidis
Max Planck Institute for Intelligent Systems, Tübingen, Germany
Muhammed Kocabas
Muhammed Kocabas
Max Planck Institute for Intelligent Systems, ETH Zurich
Machine LearningComputer Vision3D VisionComputer Graphics
P
Priyanka Patel
Meshcapade GmbH
Michael J. Black
Michael J. Black
Max Planck Institute for Intelligent Systems
Computer VisionComputer GraphicsMachine LearningVirtual HumansDigital Humans