WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low fidelity of occluded regions (e.g., back and lateral views) in monocular dynamic human 3D reconstruction. We propose a dual-space optimization framework that pioneers the integration of Score Distillation Sampling (SDS) across canonical and observation spaces. Our method jointly leverages 2D diffusion-based generative priors, differentiable rendering, pose-aware feature modulation, and multi-view consistency constraints, augmented by a view-selection strategy to enhance visual coherence. The core innovation is a pose-guided, cross-view joint optimization mechanism for geometry and appearance. Experiments demonstrate that our approach significantly improves reconstruction quality of unseen regions from monocular input, achieving state-of-the-art photorealism and enabling high-fidelity dynamic human avatars.

Technology Category

Application Category

📝 Abstract
In this paper, we present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Previous dynamic human avatar reconstruction methods typically require the input video to have full coverage of the observed human body. However, in daily practice, one typically has access to limited viewpoints, such as monocular front-view videos, making it a cumbersome task for previous methods to reconstruct the unseen parts of the human avatar. To tackle the issue, we present WonderHuman, which leverages 2D generative diffusion model priors to achieve high-quality, photorealistic reconstructions of dynamic human avatars from monocular videos, including accurate rendering of unseen body parts. Our approach introduces a Dual-Space Optimization technique, applying Score Distillation Sampling (SDS) in both canonical and observation spaces to ensure visual consistency and enhance realism in dynamic human reconstruction. Additionally, we present a View Selection strategy and Pose Feature Injection to enforce the consistency between SDS predictions and observed data, ensuring pose-dependent effects and higher fidelity in the reconstructed avatar. In the experiments, our method achieves SOTA performance in producing photorealistic renderings from the given monocular video, particularly for those challenging unseen parts. The project page and source code can be found at https://wyiguanw.github.io/WonderHuman/.
Problem

Research questions and friction points this paper is trying to address.

Single-view Video
High-fidelity Human Model Reconstruction
Unseen Body Parts
Innovation

Methods, ideas, or system contributions that make the work stand out.

WonderHuman
Single-view Video
High-precision Human Model Reconstruction
🔎 Similar Papers
No similar papers found.
Z
Zilong Wang
Department of Computer Science, The University of Texas at Dallas, Richardson, Texas
Z
Zhiyang Dou
Computer Graphics Group, The University of Hong Kong, Pokfulam, Hong Kong
Y
Yuan Liu
School of Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
C
Cheng Lin
Computer Graphics Group, The University of Hong Kong, Pokfulam, Hong Kong
Xiao Dong
Xiao Dong
Unknown affiliation
DM CV ML
Yunhui Guo
Yunhui Guo
UT Dallas
Computer VisionMachine LearningEdge Computing
Chenxu Zhang
Chenxu Zhang
ByteDance Inc.
Computer GraphicsComputer VisionAI
X
Xin Li
Department of Computer Science & Engineering, Texas A&M University, College Station, Texas
Wenping Wang
Wenping Wang
Texas A&M University
Computer GraphicsGeometric Computing
Xiaohu Guo
Xiaohu Guo
University of Texas at Dallas
Computer GraphicsComputer VisionGeometric Computing