PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion

📅 2024-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle with severe occlusion and full-body anatomical distortion in multi-person complex scenes, primarily due to the lack of 3D geometric constraints in 2D pose conditioning and excessive focus on facial identity at the expense of holistic, person-specific modeling. To address this, we propose the first diffusion-based framework for multi-person full-body generation. Our approach introduces (1) an occlusion-aware classifier-free guidance (O-CFG) mechanism and an occlusion-boundary enhancement network to improve robustness under occlusion; and (2) a dual-path SMPLx-ControlNet architecture that jointly leverages depth/normal maps, text fine-tuning, and a Face Identity ControlNet to enable 3D-consistent, controllable synthesis of full-body pose, identity, and morphology. Extensive experiments demonstrate significant improvements over state-of-the-art methods in quantitative metrics, while user studies confirm superior performance in identity fidelity, occlusion handling, and anatomical plausibility.

Technology Category

Application Category

📝 Abstract
We present PersonaCraft, a framework for controllable and occlusion-robust full-body personalized image synthesis of multiple individuals in complex scenes. Current methods struggle with occlusion-heavy scenarios and complete body personalization, as 2D pose conditioning lacks 3D geometry, often leading to ambiguous occlusions and anatomical distortions, and many approaches focus solely on facial identity. In contrast, our PersonaCraft integrates diffusion models with 3D human modeling, employing SMPLx-ControlNet, to utilize 3D geometry like depth and normal maps for robust 3D-aware pose conditioning and enhanced anatomical coherence. To handle fine-grained occlusions, we propose Occlusion Boundary Enhancer Network that exploits depth edge signals with occlusion-focused training, and Occlusion-Aware Classifier-Free Guidance strategy that selectively reinforces conditioning in occluded regions without affecting unoccluded areas. PersonaCraft can seamlessly be combined with Face Identity ControlNet, achieving full-body multi-human personalization and thus marking a significant advancement beyond prior approaches that concentrate only on facial identity. Our dual-pathway body shape representation with SMPLx-based shape parameters and textual refinement, enables precise full-body personalization and flexible user-defined body shape adjustments. Extensive quantitative experiments and user studies demonstrate that PersonaCraft significantly outperforms existing methods in generating high-quality, multi-person images with accurate personalization and robust occlusion handling.
Problem

Research questions and friction points this paper is trying to address.

Generates personalized full-body images of multiple humans in complex scenes.
Handles occlusion-heavy scenarios with 3D-aware pose conditioning.
Achieves precise full-body personalization and flexible body shape adjustments.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates diffusion models with 3D human modeling
Uses Occlusion Boundary Enhancer Network for fine-grained occlusions
Combines Face Identity ControlNet for full-body personalization
🔎 Similar Papers
No similar papers found.
Gwanghyun Kim
Gwanghyun Kim
Seoul National University (SNU)
Generative AIMultimodal LearningComputer Vision3DDigital Humans
S
Suh Yoon Jeon
Dept. of Electrical and Computer Engineering, Seoul National University, Republic of Korea
S
Seunggyu Lee
Dept. of Electrical and Computer Engineering, Seoul National University, Republic of Korea
Se Young Chun
Se Young Chun
Department of Electrical and Computer Engineering, Seoul National University
computational imagingmachine learningsignal processingmultimodal processing