VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction

📅 2025-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the open challenge in virtual try-on (VTON) of lacking 3D consistency and arbitrary-view rendering capability. We propose the first end-to-end 3D-consistent multi-view VTON framework. Methodologically, we introduce a pseudo-3D normal pose representation derived from SMPL-X normal maps, a multi-view spatial cross-attention mechanism, and an enhanced multi-view CLIP embedding conditioned on camera parameters—integrated within a diffusion-based generative architecture for joint multi-view synthesis. Unlike existing 2D or weakly 3D approaches, our method achieves, for the first time on real-world e-commerce datasets, high-fidelity, geometrically consistent try-on renderings under full 360° free-view control. Quantitative and qualitative evaluations demonstrate state-of-the-art performance in both visual quality and 3D consistency.

Technology Category

Application Category

📝 Abstract
Virtual Try-On (VTON) is a transformative technology in e-commerce and fashion design, enabling realistic digital visualization of clothing on individuals. In this work, we propose VTON 360, a novel 3D VTON method that addresses the open challenge of achieving high-fidelity VTON that supports any-view rendering. Specifically, we leverage the equivalence between a 3D model and its rendered multi-view 2D images, and reformulate 3D VTON as an extension of 2D VTON that ensures 3D consistent results across multiple views. To achieve this, we extend 2D VTON models to include multi-view garments and clothing-agnostic human body images as input, and propose several novel techniques to enhance them, including: i) a pseudo-3D pose representation using normal maps derived from the SMPL-X 3D human model, ii) a multi-view spatial attention mechanism that models the correlations between features from different viewing angles, and iii) a multi-view CLIP embedding that enhances the garment CLIP features used in 2D VTON with camera information. Extensive experiments on large-scale real datasets and clothing images from e-commerce platforms demonstrate the effectiveness of our approach. Project page: https://scnuhealthy.github.io/VTON360.
Problem

Research questions and friction points this paper is trying to address.

Achieves high-fidelity virtual try-on from any viewing direction.
Extends 2D VTON to ensure 3D consistency across multiple views.
Enhances garment visualization with multi-view spatial attention and CLIP embeddings.
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D VTON using multi-view 2D images
Pseudo-3D pose with SMPL-X normal maps
Multi-view spatial attention and CLIP embedding
🔎 Similar Papers
No similar papers found.