🤖 AI Summary
Existing virtual try-on methods exhibit poor robustness under real-world conditions such as extreme poses, illumination variations, and motion blur, often failing to balance high-fidelity detail preservation with generalization across diverse garment categories. This work proposes an end-to-end virtual try-on system that integrates an optimized generative architecture, a multi-stage training strategy, a scalable data engine, and an efficient inference framework. The system supports flexible combinations of up to eight clothing categories and six reference images while maintaining identity consistency and controllable background rendering. It achieves significant improvements in generation success rate and detail fidelity under complex conditions. Deployed industrially in the Taobao mobile application, the system serves tens of millions of daily requests from over one million users and includes a publicly released benchmark demonstrating state-of-the-art overall performance.
📝 Abstract
Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly efficient. First, our system maintains a high success rate across challenging cases like extreme poses, severe illumination variations, motion blur, and other in-the-wild conditions. Second, it delivers highly photorealistic results with fine-grained details, faithfully preserving garment texture, material properties, and structural characteristics, while largely avoiding common AI-generated artifacts. Third, beyond apparel try-on, our model supports flexible multi-image composition (up to 6 reference images) across 8 fashion categories, with coordinated control over person identity and background. Fourth, to overcome the latency bottlenecks of commercial deployment, our system is heavily optimized for inference speed, delivering near real-time generation for a seamless user experience. These capabilities are enabled by an integrated system design spanning end-to-end model architecture, a scalable data engine, robust infrastructure, and a multi-stage training paradigm. Extensive evaluation and large-scale product deployment demonstrate that Tstars-Tryon1.0 achieves leading overall performance. To support future research, we also release a comprehensive benchmark. The model has been deployed at an industrial scale on the Taobao App, serving millions of users with tens of millions of requests.