InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Reconstructing complete and animatable 3D human avatars from monocular videos remains challenging under severe occlusions, as existing methods often suffer from geometric artifacts and temporal inconsistencies due to missing observations. This work proposes a novel approach that leverages multi-scale UV parameterization for robust geometry reconstruction and introduces an identity-preserving diffusion inpainting module, which integrates textual inversion and semantic guidance to effectively recover occluded regions while maintaining subject-specific details and temporal coherence. By employing hierarchical feature interpolation and pixel-level direct supervision, the method achieves significantly improved reconstruction quality on the PeopleSnapshot, ZJU-MoCap, and OcMotion datasets, yielding more complete geometries and temporally stable animations.

Technology Category

Application Category

📝 Abstract

Reconstructing complete and animatable 3D human avatars from monocular videos remains challenging, particularly under severe occlusions. While 3D Gaussian Splatting has enabled photorealistic human rendering, existing methods struggle with incomplete observations, often producing corrupted geometry and temporal inconsistencies. We present InpaintHuman, a novel method for generating high-fidelity, complete, and animatable avatars from occluded monocular videos. Our approach introduces two key innovations: (i) a multi-scale UV-parameterized representation with hierarchical coarse-to-fine feature interpolation, enabling robust reconstruction of occluded regions while preserving geometric details; and (ii) an identity-preserving diffusion inpainting module that integrates textual inversion with semantic-conditioned guidance for subject-specific, temporally coherent completion. Unlike SDS-based methods, our approach employs direct pixel-level supervision to ensure identity fidelity. Experiments on synthetic benchmarks (PeopleSnapshot, ZJU-MoCap) and real-world scenarios (OcMotion) demonstrate competitive performance with consistent improvements in reconstruction quality across diverse poses and viewpoints.

Problem

Research questions and friction points this paper is trying to address.

3D human reconstruction

occlusion

monocular video

animatable avatars

temporal consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-scale UV mapping

identity-preserving diffusion inpainting

3D human reconstruction