🤖 AI Summary
To address temporal inconsistency of garments and distortion of fine details caused by dynamic human motion in long-video virtual try-on, this paper proposes the first high-fidelity virtual try-on framework specifically designed for long videos. Methodologically, we introduce a novel garment-temporal consistency modeling mechanism, design a class-agnostic attention-focusing loss, and integrate pose-driven spatiotemporal diffusion modeling with multi-scale consistency constraints. Compared to existing approaches, our method significantly improves temporal coherence and fabric detail fidelity. It achieves state-of-the-art performance across multiple benchmarks—attaining the best FID, LPIPS, and user-perceived quality scores on both single-image and video-based try-on tasks. Extensive real-world validation demonstrates strong practical applicability, including deployment in e-commerce recommendation systems and offline virtual fitting rooms.
📝 Abstract
Virtual try-on has emerged as a pivotal task at the intersection of computer vision and fashion, aimed at digitally simulating how clothing items fit on the human body. Despite notable progress in single-image virtual try-on (VTO), current methodologies often struggle to preserve a consistent and authentic appearance of clothing across extended video sequences. This challenge arises from the complexities of capturing dynamic human pose and maintaining target clothing characteristics. We leverage pre-existing video foundation models to introduce RealVVT, a photoRealistic Video Virtual Try-on framework tailored to bolster stability and realism within dynamic video contexts. Our methodology encompasses a Clothing&Temporal Consistency strategy, an Agnostic-guided Attention Focus Loss mechanism to ensure spatial consistency, and a Pose-guided Long Video VTO technique adept at handling extended video sequences.Extensive experiments across various datasets confirms that our approach outperforms existing state-of-the-art models in both single-image and video VTO tasks, offering a viable solution for practical applications within the realms of fashion e-commerce and virtual fitting environments.