InfiniHuman: Infinite 3D Human Creation with Precise Control

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high cost, limited scale, and insufficient diversity of manually collected and annotated 3D human data, this paper proposes (1) InfiniHumanData, a fully automated pipeline that distills multimodal foundation models (vision-language and image-generation models), integrates SMPL-based parametric modeling, multi-view rendering, and cross-modal alignment to construct a large-scale 3D human dataset comprising 111K distinct identities with fine-grained semantic annotations; and (2) InfiniHumanGen, a diffusion-based framework enabling joint conditional control via text, body shape, clothing, and other attributes for high-fidelity, highly controllable 3D avatar generation. Experiments demonstrate significant improvements over state-of-the-art methods in visual quality, generation efficiency, and controllability. User studies confirm that generated avatars are perceptually indistinguishable from real scans. Both the dataset and models will be publicly released.

Technology Category

Application Category

📝 Abstract
Generating realistic and controllable 3D human avatars is a long-standing challenge, particularly when covering broad attribute ranges such as ethnicity, age, clothing styles, and detailed body shapes. Capturing and annotating large-scale human datasets for training generative models is prohibitively expensive and limited in scale and diversity. The central question we address in this paper is: Can existing foundation models be distilled to generate theoretically unbounded, richly annotated 3D human data? We introduce InfiniHuman, a framework that synergistically distills these models to produce richly annotated human data at minimal cost and with theoretically unlimited scalability. We propose InfiniHumanData, a fully automatic pipeline that leverages vision-language and image generation models to create a large-scale multi-modal dataset. User study shows our automatically generated identities are undistinguishable from scan renderings. InfiniHumanData contains 111K identities spanning unprecedented diversity. Each identity is annotated with multi-granularity text descriptions, multi-view RGB images, detailed clothing images, and SMPL body-shape parameters. Building on this dataset, we propose InfiniHumanGen, a diffusion-based generative pipeline conditioned on text, body shape, and clothing assets. InfiniHumanGen enables fast, realistic, and precisely controllable avatar generation. Extensive experiments demonstrate significant improvements over state-of-the-art methods in visual quality, generation speed, and controllability. Our approach enables high-quality avatar generation with fine-grained control at effectively unbounded scale through a practical and affordable solution. We will publicly release the automatic data generation pipeline, the comprehensive InfiniHumanData dataset, and the InfiniHumanGen models at https://yuxuan-xue.com/infini-human.
Problem

Research questions and friction points this paper is trying to address.

Generating realistic controllable 3D human avatars
Overcoming expensive limited human dataset creation
Distilling foundation models for infinite 3D human data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distills foundation models for unlimited 3D human data
Automates dataset creation with vision-language and image generation
Uses diffusion-based generative pipeline for controllable avatar generation
🔎 Similar Papers
No similar papers found.