MultiGO++: Monocular 3D Clothed Human Reconstruction via Geometry-Texture Collaboration

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular 3D clothed human reconstruction is often hindered by scarce texture data, inaccurate geometric priors, and supervision bias inherent to single-modality learning, leading to suboptimal reconstruction quality. To address these limitations, this work proposes a geometry–texture collaborative reconstruction framework. We construct a large-scale dataset comprising over 15,000 textured 3D human scans and introduce a multi-source texture synthesis strategy, a region-aware shape extraction module, and a Fourier-based geometric encoding mechanism. A dual-branch U-Net architecture is further designed to effectively fuse geometry and texture features. By transcending the constraints of single-modality supervision, our method achieves state-of-the-art performance across multiple benchmarks and in-the-wild images, enabling high-fidelity, high-quality 3D reconstruction of clothed humans from a single image.

Technology Category

Application Category

📝 Abstract
Monocular 3D clothed human reconstruction aims to generate a complete and realistic textured 3D avatar from a single image. Existing methods are commonly trained under multi-view supervision with annotated geometric priors, and during inference, these priors are estimated by the pre-trained network from the monocular input. These methods are constrained by three key limitations: texturally by unavailability of training data, geometrically by inaccurate external priors, and systematically by biased single-modality supervision, all leading to suboptimal reconstruction. To address these issues, we propose a novel reconstruction framework, named MultiGO++, which achieves effective systematic geometry-texture collaboration. It consists of three core parts: (1) A multi-source texture synthesis strategy that constructs 15,000+ 3D textured human scans to improve the performance on texture quality estimation in challenge scenarios; (2) A region-aware shape extraction module that extracts and interacts features of each body region to obtain geometry information and a Fourier geometry encoder that mitigates the modality gap to achieve effective geometry learning; (3) A dual reconstruction U-Net that leverages geometry-texture collaborative features to refine and generate high-fidelity textured 3D human meshes. Extensive experiments on two benchmarks and many in-the-wild cases show the superiority of our method over state-of-the-art approaches.
Problem

Research questions and friction points this paper is trying to address.

monocular 3D reconstruction
clothed human
geometry-texture collaboration
texture synthesis
geometric priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

geometry-texture collaboration
multi-source texture synthesis
region-aware shape extraction
Fourier geometry encoder
dual reconstruction U-Net
🔎 Similar Papers
No similar papers found.
N
Nanjie Yao
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511442, China
G
Gangjian Zhang
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511442, China
Wenhao Shen
Wenhao Shen
Nanyang Technological University
Computer Vision3D Vision
J
Jian Shu
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511442, China
Yu Feng
Yu Feng
National University of Defense Technology
Learning theoryKernel methodClustering
Hao Wang
Hao Wang
The Hong Kong University of Science and Technology
Machine LearningData Mining