UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Reconstructing high-fidelity 3D dressed human avatars from unconstrained in-the-wild 2D photographs—exhibiting arbitrary poses, viewpoints, crops, and occlusions—remains a challenging open problem. This paper proposes a template-free, fine-tuning-free single-pass reconstruction framework. Its core innovations are: (1) a data rectification paradigm coupled with a Pose-Conditioned Feature Aggregation (PCFA) module, enabling second-level generation of orthographic multi-view images and memory-constant multi-image fusion; and (2) a Perceiver-based multi-reference shape prediction network that jointly optimizes geometry and texture via implicit neural representation and multi-view image distillation. On PuzzleIOI, our method reduces Chamfer distance and point-to-surface (P2S) error by 15% and 18%, respectively; on 4D-Dress, it improves PSNR by 21% and LPIPS by 46%. A single-avatar reconstruction takes only 1.5 minutes—significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

We present UP2You, the first tuning-free solution for reconstructing high-fidelity 3D clothed portraits from extremely unconstrained in-the-wild 2D photos. Unlike previous approaches that require "clean" inputs (e.g., full-body images with minimal occlusions, or well-calibrated cross-view captures), UP2You directly processes raw, unstructured photographs, which may vary significantly in pose, viewpoint, cropping, and occlusion. Instead of compressing data into tokens for slow online text-to-3D optimization, we introduce a data rectifier paradigm that efficiently converts unconstrained inputs into clean, orthogonal multi-view images in a single forward pass within seconds, simplifying the 3D reconstruction. Central to UP2You is a pose-correlated feature aggregation module (PCFA), that selectively fuses information from multiple reference images w.r.t. target poses, enabling better identity preservation and nearly constant memory footprint, with more observations. We also introduce a perceiver-based multi-reference shape predictor, removing the need for pre-captured body templates. Extensive experiments on 4D-Dress, PuzzleIOI, and in-the-wild captures demonstrate that UP2You consistently surpasses previous methods in both geometric accuracy (Chamfer-15%, P2S-18% on PuzzleIOI) and texture fidelity (PSNR-21%, LPIPS-46% on 4D-Dress). UP2You is efficient (1.5 minutes per person), and versatile (supports arbitrary pose control, and training-free multi-garment 3D virtual try-on), making it practical for real-world scenarios where humans are casually captured. Both models and code will be released to facilitate future research on this underexplored task. Project Page: https://zcai0612.github.io/UP2You

Problem

Research questions and friction points this paper is trying to address.

Reconstructs 3D clothed portraits from unconstrained 2D photos

Processes raw photos with varying poses and occlusions efficiently

Eliminates need for clean inputs or pre-captured body templates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tuning-free 3D reconstruction from unconstrained photos

Data rectifier converts inputs to clean multi-view images

Pose-correlated feature aggregation for identity preservation

🔎 Similar Papers

Recognizing Identities From Human Skeletons: A Survey on 3D Skeleton Based Person Re-Identification

2024-01-27Citations: 0

Authors to Follow