🤖 AI Summary
Existing AIGC methods for reference-guided ad image generation—given a product image and textual description—struggle to balance fidelity and generalization: fine-tuning-based approaches suffer from low efficiency, while tuning-free methods exhibit poor cross-product fidelity, hindering e-commerce deployment. This paper proposes RefAdGen, a diffusion-based framework that decouples product appearance from scene semantics via product mask injection and an Attention Fusion Module (AFM) integrated into a U-Net architecture, enabling efficient, high-fidelity generation. We introduce AdProd-100K, the first large-scale 3D-aware advertising dataset, and design a two-stage data augmentation strategy to enhance geometric and semantic diversity. Experiments demonstrate that RefAdGen achieves state-of-the-art performance across multiple product categories, unseen items, and real-world wild images—significantly improving generation fidelity, visual quality, and cross-domain generalization. The framework exhibits strong industrial applicability for scalable, high-quality ad creation in e-commerce.
📝 Abstract
The rapid advancement of Artificial Intelligence Generated Content (AIGC) techniques has unlocked opportunities in generating diverse and compelling advertising images based on referenced product images and textual scene descriptions. This capability substantially reduces human labor and production costs in traditional marketing workflows. However, existing AIGC techniques either demand extensive fine-tuning for each referenced image to achieve high fidelity, or they struggle to maintain fidelity across diverse products, making them impractical for e-commerce and marketing industries. To tackle this limitation, we first construct AdProd-100K, a large-scale advertising image generation dataset. A key innovation in its construction is our dual data augmentation strategy, which fosters robust, 3D-aware representations crucial for realistic and high-fidelity image synthesis. Leveraging this dataset, we propose RefAdGen, a generation framework that achieves high fidelity through a decoupled design. The framework enforces precise spatial control by injecting a product mask at the U-Net input, and employs an efficient Attention Fusion Module (AFM) to integrate product features. This design effectively resolves the fidelity-efficiency dilemma present in existing methods. Extensive experiments demonstrate that RefAdGen achieves state-of-the-art performance, showcasing robust generalization by maintaining high fidelity and remarkable visual results for both unseen products and challenging real-world, in-the-wild images. This offers a scalable and cost-effective alternative to traditional workflows. Code and datasets are publicly available at https://github.com/Anonymous-Name-139/RefAdgen.