Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address two key bottlenecks in virtual try-on for loose-fitting garments—semantic map failure under occlusion and temporal flickering caused by single-frame synthesis—this paper proposes a two-stage robust semantic map estimation and recurrent garment synthesis framework. Methodologically: (1) it introduces garment-invariant feature extraction and an auxiliary semantic map estimation network to enhance robustness in inferring occluded human structure; (2) it designs a lightweight RNN-based recurrent generation architecture to explicitly model inter-frame temporal dependencies. While maintaining real-time rendering at over 30 FPS, our method significantly outperforms state-of-the-art approaches in both image fidelity (18% lower LPIPS) and temporal consistency (32% lower FVD). To the best of our knowledge, it is the first to achieve high-fidelity, low-flicker, per-garment virtual try-on for loose clothing in real time.

Technology Category

Application Category

📝 Abstract
Per-garment virtual try-on methods collect garment-specific datasets and train networks tailored to each garment to achieve superior results. However, these approaches often struggle with loose-fitting garments due to two key limitations: (1) They rely on human body semantic maps to align garments with the body, but these maps become unreliable when body contours are obscured by loose-fitting garments, resulting in degraded outcomes; (2) They train garment synthesis networks on a per-frame basis without utilizing temporal information, leading to noticeable jittering artifacts. To address these challenges, we propose a two-stage approach for robust semantic map estimation. First, we extract a garment-invariant representation from the raw input image. This representation is then passed through an auxiliary network to estimate the semantic map. This enhances the robustness of semantic map estimation under loose-fitting garments during garment-specific dataset generation. Furthermore, we introduce a recurrent garment synthesis framework that incorporates temporal dependencies to improve frame-to-frame coherence while maintaining real-time performance. We conducted qualitative and quantitative evaluations to demonstrate that our method outperforms existing approaches in both image quality and temporal coherence. Ablation studies further validate the effectiveness of the garment-invariant representation and the recurrent synthesis framework.
Problem

Research questions and friction points this paper is trying to address.

Improving semantic map accuracy for loose-fitting garments
Reducing jittering artifacts with temporal information
Enhancing real-time virtual try-on coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Garment-invariant representation for robust semantic maps
Recurrent synthesis framework for temporal coherence
Two-stage approach for loose-fitting garment alignment
🔎 Similar Papers
No similar papers found.