PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the scarcity of annotated data in 3D human mesh estimation and the limited realism and diversity of existing synthetic datasets by proposing the first diffusion-based framework for generating highly photorealistic human images. The method leverages controllable image generation inherently aligned with 3D annotations, enhanced through Direct Preference Optimization for precise control, curriculum-based hard example mining, and multi-stage quality filtering to automatically produce over 500,000 high-quality images with accurate 3D mesh labels. The synthesized data achieves a 76% improvement in image quality over conventional rendering techniques. Models trained on this data match or surpass those trained on real or traditionally rendered synthetic data, and further gains are observed when combined with real-world data, effectively overcoming the longstanding limitations of traditional rendering in both realism and diversity.

Technology Category

Application Category

📝 Abstract

Acquiring labeled datasets for 3D human mesh estimation is challenging due to depth ambiguities and the inherent difficulty of annotating 3D geometry from monocular images. Existing datasets are either real, with manually annotated 3D geometry and limited scale, or synthetic, rendered from 3D engines that provide precise labels but suffer from limited photorealism, low diversity, and high production costs. In this work, we explore a third path: generated data. We introduce PoseDreamer, a novel pipeline that leverages diffusion models to generate large-scale synthetic datasets with 3D mesh annotations. Our approach combines controllable image generation with Direct Preference Optimization for control alignment, curriculum-based hard sample mining, and multi-stage quality filtering. Together, these components naturally maintain correspondence between 3D labels and generated images, while prioritizing challenging samples to maximize dataset utility. Using PoseDreamer, we generate more than 500,000 high-quality synthetic samples, achieving a 76% improvement in image-quality metrics compared to rendering-based datasets. Models trained on PoseDreamer achieve performance comparable to or superior to those trained on real-world and traditional synthetic datasets. In addition, combining PoseDreamer with synthetic datasets results in better performance than combining real-world and synthetic datasets, demonstrating the complementary nature of our dataset. We will release the full dataset and generation code.

Problem

Research questions and friction points this paper is trying to address.

3D human mesh estimation

labeled dataset

photorealism

data generation

depth ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models

3D human mesh

synthetic data generation