🤖 AI Summary
To address the insufficient product appearance and layout consistency of general-purpose image editing models in e-commerce scenarios, this paper proposes TBStar-Edit, a domain-specific model. Methodologically: (1) we establish a high-quality image–instruction pair data pipeline tailored for e-commerce; (2) we design a hierarchical architecture comprising a base generator, a pattern transfer module, and a consistency enhancement module; and (3) we introduce a two-stage training framework integrating instruction-following data augmentation with consistency regularization. Key innovations include the pattern transfer mechanism and the novel VIE Score—a quantitative metric for evaluating visual instruction fidelity and layout coherence. On our curated e-commerce benchmark, TBStar-Edit significantly outperforms state-of-the-art general-purpose models, achieving a 12.7% improvement in VIE Score and an 89.3% subjective user preference win rate.
📝 Abstract
Recent advances in image generation and editing technologies have enabled state-of-the-art models to achieve impressive results in general domains. However, when applied to e-commerce scenarios, these general models often encounter consistency limitations. To address this challenge, we introduce TBStar-Edit, an new image editing model tailored for the e-commerce domain. Through rigorous data engineering, model architecture design and training strategy, TBStar-Edit achieves precise and high-fidelity image editing while maintaining the integrity of product appearance and layout. Specifically, for data engineering, we establish a comprehensive data construction pipeline, encompassing data collection, construction, filtering, and augmentation, to acquire high-quality, instruction-following, and strongly consistent editing data to support model training. For model architecture design, we design a hierarchical model framework consisting of a base model, pattern shifting modules, and consistency enhancement modules. For model training, we adopt a two-stage training strategy to enhance the consistency preservation: first stage for editing pattern shifting, and second stage for consistency enhancement. Each stage involves training different modules with separate datasets. Finally, we conduct extensive evaluations of TBStar-Edit on a self-proposed e-commerce benchmark, and the results demonstrate that TBStar-Edit outperforms existing general-domain editing models in both objective metrics (VIE Score) and subjective user preference.