HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Virtual try-on faces three core challenges: geometric distortion across poses, semantic inconsistency, and fine-detail loss. To address these, we propose a synergistic framework comprising APWAM (Pose-Aware Deformable Alignment), SRCM (Fine-Grained Semantic Representation Learning), and MPAGM (Multimodal Prior-Guided Generation), enabling the first joint modeling of geometric deformation and semantic structure consistency. We further introduce SAMP-VTONS—the first benchmark dataset explicitly designed for multi-pose evaluation—and integrate pretrained vision-language models with pose-aware deformation modeling. Extensive experiments demonstrate state-of-the-art performance on both VITON-HD and SAMP-VTONS, achieving significant improvements in image fidelity, clothing structural and textural consistency, and local detail recovery.

Technology Category

Application Category

📝 Abstract
Virtual try-on technology has become increasingly important in the fashion and retail industries, enabling the generation of high-fidelity garment images that adapt seamlessly to target human models. While existing methods have achieved notable progress, they still face significant challenges in maintaining consistency across different poses. Specifically, geometric distortions lead to a lack of spatial consistency, mismatches in garment structure and texture across poses result in semantic inconsistency, and the loss or distortion of fine-grained details diminishes visual fidelity. To address these challenges, we propose HF-VTON, a novel framework that ensures high-fidelity virtual try-on performance across diverse poses. HF-VTON consists of three key modules: (1) the Appearance-Preserving Warp Alignment Module (APWAM), which aligns garments to human poses, addressing geometric deformations and ensuring spatial consistency; (2) the Semantic Representation and Comprehension Module (SRCM), which captures fine-grained garment attributes and multi-pose data to enhance semantic representation, maintaining structural, textural, and pattern consistency; and (3) the Multimodal Prior-Guided Appearance Generation Module (MPAGM), which integrates multimodal features and prior knowledge from pre-trained models to optimize appearance generation, ensuring both semantic and geometric consistency. Additionally, to overcome data limitations in existing benchmarks, we introduce the SAMP-VTONS dataset, featuring multi-pose pairs and rich textual annotations for a more comprehensive evaluation. Experimental results demonstrate that HF-VTON outperforms state-of-the-art methods on both VITON-HD and SAMP-VTONS, excelling in visual fidelity, semantic consistency, and detail preservation.
Problem

Research questions and friction points this paper is trying to address.

Maintaining spatial consistency in virtual try-on across poses
Ensuring semantic consistency in garment structure and texture
Preserving fine-grained details for high visual fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

APWAM aligns garments to poses spatially
SRCM enhances semantic garment representation
MPAGM integrates multimodal features for generation
🔎 Similar Papers
No similar papers found.
Ming Meng
Ming Meng
School of Data Science and Media Intelligence, Communication University of China, Beijing, China
Qi Dong
Qi Dong
Amazon, AWS, Rekcognition
Computer VisionMachine LearningArtificial Intelligence
Jiajie Li
Jiajie Li
University at Buffalo
computer sciencemachine learningartificial intelligence
Z
Zhe Zhu
Samsung Research America, USA
Xingyu Wang
Xingyu Wang
Nanjing University of Posts and Telecommunications
NLP
Z
Zhaoxin Fan
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University, Beijing, China; Hangzhou International Innovation Institute, Beihang University, Beijing, China
W
Wei Zhao
School of Data Science and Media Intelligence, Communication University of China, Beijing, China
W
Wenjun Wu
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University, Beijing, China