BEDLAM2.0: Synthetic Humans and Cameras in Motion

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the long-standing challenge of estimating 3D human motion in world coordinates from monocular video—particularly hindered by low accuracy under synchronized human-camera motion and the scarcity of real-world ground-truth annotations. To this end, we introduce BEDLAM 2.0: the first large-scale, high-fidelity synthetic dataset for this task. Its key innovations include diverse human morphologies, clothing, hairstyles, footwear, and complex 3D environments, coupled with photorealistic camera motion trajectories. BEDLAM 2.0 provides pixel-accurate rendered videos, precise SMPL-X body parameters, and ground-truth camera poses. Extensive experiments demonstrate that models trained on BEDLAM 2.0 achieve significantly improved accuracy in world-coordinate 3D pose and motion estimation—reducing average error by 18.7% over the original BEDLAM. This establishes a robust data foundation and performance benchmark for markerless reconstruction in unconstrained real-world scenarios.

Technology Category

Application Category

📝 Abstract

Inferring 3D human motion from video remains a challenging problem with many applications. While traditional methods estimate the human in image coordinates, many applications require human motion to be estimated in world coordinates. This is particularly challenging when there is both human and camera motion. Progress on this topic has been limited by the lack of rich video data with ground truth human and camera movement. We address this with BEDLAM2.0, a new dataset that goes beyond the popular BEDLAM dataset in important ways. In addition to introducing more diverse and realistic cameras and camera motions, BEDLAM2.0 increases diversity and realism of body shape, motions, clothing, hair, and 3D environments. Additionally, it adds shoes, which were missing in BEDLAM. BEDLAM has become a key resource for training 3D human pose and motion regressors today and we show that BEDLAM2.0 is significantly better, particularly for training methods that estimate humans in world coordinates. We compare state-of-the art methods trained on BEDLAM and BEDLAM2.0, and find that BEDLAM2.0 significantly improves accuracy over BEDLAM. For research purposes, we provide the rendered videos, ground truth body parameters, and camera motions. We also provide the 3D assets to which we have rights and links to those from third parties.

Problem

Research questions and friction points this paper is trying to address.

Estimating 3D human motion in world coordinates from video

Addressing challenges when both human and camera are moving

Overcoming limited training data with ground truth human and camera motion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic dataset with realistic human and camera motion

Enhanced diversity in body shapes, clothing, and environments

Provides ground truth data for 3D human pose estimation

🔎 Similar Papers

No similar papers found.