Unify3D: An Augmented Holistic End-to-end Monocular 3D Human Reconstruction via Anatomy Shaping and Twins Negotiating

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular 3D human reconstruction has long suffered from reliance on explicit intermediate geometric representations (e.g., SMPL parameters or voxels), resulting in incomplete end-to-end modeling, low geometric fidelity, and poor robustness to occlusion and complex clothing. This paper proposes the first end-to-end implicit reconstruction framework that requires no explicit geometric priors. We design an anatomy-aware implicit shape extraction module and introduce a dual-modality U-Net architecture to directly map RGB images to joint signed distance function (SDF) and neural radiance field (NeRF) representations. Additionally, we propose manga-style data augmentation and release a large-scale 3D human dataset comprising over 15,000 high-quality scans. Extensive experiments on multiple benchmarks and in-the-wild scenes demonstrate significant improvements over state-of-the-art methods: +23.6% in geometric detail fidelity and +31.4% in pose robustness—achieving, for the first time, high-fidelity, temporally consistent reconstructions under severe occlusion and complex clothing.

Technology Category

Application Category

📝 Abstract
Monocular 3D clothed human reconstruction aims to create a complete 3D avatar from a single image. To tackle the human geometry lacking in one RGB image, current methods typically resort to a preceding model for an explicit geometric representation. For the reconstruction itself, focus is on modeling both it and the input image. This routine is constrained by the preceding model, and overlooks the integrity of the reconstruction task. To address this, this paper introduces a novel paradigm that treats human reconstruction as a holistic process, utilizing an end-to-end network for direct prediction from 2D image to 3D avatar, eliminating any explicit intermediate geometry display. Based on this, we further propose a novel reconstruction framework consisting of two core components: the Anatomy Shaping Extraction module, which captures implicit shape features taking into account the specialty of human anatomy, and the Twins Negotiating Reconstruction U-Net, which enhances reconstruction through feature interaction between two U-Nets of different modalities. Moreover, we propose a Comic Data Augmentation strategy and construct 15k+ 3D human scans to bolster model performance in more complex case input. Extensive experiments on two test sets and many in-the-wild cases show the superiority of our method over SOTA methods. Our demos can be found in : https://e2e3dgsrecon.github.io/e2e3dgsrecon/.
Problem

Research questions and friction points this paper is trying to address.

Monocular 3D clothed human reconstruction from single image
Eliminating explicit intermediate geometry representation
Enhancing reconstruction with anatomy-aware feature interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end network for direct 2D to 3D prediction
Anatomy Shaping Extraction for implicit shape features
Twins Negotiating U-Net for multi-modal feature interaction
🔎 Similar Papers
No similar papers found.
N
Nanjie Yao
HKUST(GZ)
G
Gangjian Zhang
HKUST(GZ)
Wenhao Shen
Wenhao Shen
Nanyang Technological University
Computer Vision3D Vision
J
Jian Shu
HKUST(GZ)
H
Hao Wang
HKUST(GZ)