PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of annotated data in 3D human mesh estimation and the limited realism and diversity of existing synthetic datasets by proposing the first diffusion-based framework for generating highly photorealistic human images. The method leverages controllable image generation inherently aligned with 3D annotations, enhanced through Direct Preference Optimization for precise control, curriculum-based hard example mining, and multi-stage quality filtering to automatically produce over 500,000 high-quality images with accurate 3D mesh labels. The synthesized data achieves a 76% improvement in image quality over conventional rendering techniques. Models trained on this data match or surpass those trained on real or traditionally rendered synthetic data, and further gains are observed when combined with real-world data, effectively overcoming the longstanding limitations of traditional rendering in both realism and diversity.
📝 Abstract
Acquiring labeled datasets for 3D human mesh estimation is challenging due to depth ambiguities and the inherent difficulty of annotating 3D geometry from monocular images. Existing datasets are either real, with manually annotated 3D geometry and limited scale, or synthetic, rendered from 3D engines that provide precise labels but suffer from limited photorealism, low diversity, and high production costs. In this work, we explore a third path: generated data. We introduce PoseDreamer, a novel pipeline that leverages diffusion models to generate large-scale synthetic datasets with 3D mesh annotations. Our approach combines controllable image generation with Direct Preference Optimization for control alignment, curriculum-based hard sample mining, and multi-stage quality filtering. Together, these components naturally maintain correspondence between 3D labels and generated images, while prioritizing challenging samples to maximize dataset utility. Using PoseDreamer, we generate more than 500,000 high-quality synthetic samples, achieving a 76% improvement in image-quality metrics compared to rendering-based datasets. Models trained on PoseDreamer achieve performance comparable to or superior to those trained on real-world and traditional synthetic datasets. In addition, combining PoseDreamer with synthetic datasets results in better performance than combining real-world and synthetic datasets, demonstrating the complementary nature of our dataset. We will release the full dataset and generation code.
Problem

Research questions and friction points this paper is trying to address.

3D human mesh estimation
labeled dataset
photorealism
data generation
depth ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models
3D human mesh
synthetic data generation
Direct Preference Optimization
hard sample mining
🔎 Similar Papers
No similar papers found.
L
Lorenza Prospero
The Podium Institute for Sports Medicine and Technology, University of Oxford
O
Orest Kupyn
Visual Geometry Group, University of Oxford
O
Ostap Viniavskyi
Ukrainian Catholic University
João F. Henriques
João F. Henriques
Visual Geometry Group, University of Oxford
computer visionmachine learningcirculant matricesfourier analysis
Christian Rupprecht
Christian Rupprecht
University of Oxford
Machine LearningComputer Vision