Generalizable Human Gaussians from Single-View Image

📅 2024-06-10

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses the challenge of jointly recovering geometry and appearance in occluded regions for single-image 3D human reconstruction. We propose the first generalizable single-view Gaussian human modeling framework. Methodologically, we design a two-stage generation-refinement pipeline: in the generation stage, a dual-branch prior fusion mechanism maps SMPL-X volumetric features to a Gaussian point cloud, enabling joint progressive optimization of pose and shape; in the refinement stage, we integrate ControlNet-guided image consistency constraints, sparse convolution, attention-based feature propagation, and differentiable Gaussian rendering for end-to-end optimization. Experiments demonstrate that our method surpasses state-of-the-art approaches in novel-view synthesis and surface reconstruction, significantly improving fidelity in unobserved (occluded) regions. Moreover, it exhibits strong generalization across diverse datasets and real-world in-the-wild images.

Technology Category

Application Category

📝 Abstract

In this work, we tackle the task of learning 3D human Gaussians from a single image, focusing on recovering detailed appearance and geometry including unobserved regions. We introduce a single-view generalizable Human Gaussian Model (HGM), which employs a novel generate-then-refine pipeline with the guidance from human body prior and diffusion prior. Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians, then uses the refined image along with the input image to reconstruct refined human Gaussians. To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch, propagating image features from the SMPL-X volume to the image Gaussians using sparse convolution and attention mechanisms. Given that the initial SMPL-X estimation might be inaccurate, we gradually refine it with our HGM model. We validate our approach on several publicly available datasets. Our method surpasses previous methods in both novel view synthesis and surface reconstruction. Our approach also exhibits strong generalization for cross-dataset evaluation and in-the-wild images.

Problem

Research questions and friction points this paper is trying to address.

Learning 3D human Gaussians from single-view images

Recovering detailed appearance and unobserved geometry

Improving realism with human priors and refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-view generalizable Human Gaussian Model (HGM)

ControlNet refines back-view images

SMPL-X prior with sparse convolution

🔎 Similar Papers

GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers