Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of low fidelity and semantic distortion in e-commerce product image background replacement. We propose an end-to-end recontextualization framework built upon text-to-image diffusion models. Methodologically, we introduce the first integrated data synthesis pipeline combining image-to-video diffusion, intelligent inpainting/outpainting, and negative-sample augmentation, alongside a product representation disentanglement mechanism to jointly optimize structural consistency and attribute fidelity. Experiments on the ABO and proprietary e-commerce datasets demonstrate substantial improvements: FID decreases by 32%, CLIP-Score increases by 18%, and human evaluations show 41% and 53% gains in realism and product consistency, respectively—outperforming state-of-the-art methods. Our core contributions are: (1) the first controllable generation paradigm tailored for product recontextualization; (2) a disentangled product representation learning mechanism; and (3) a multi-stage synthesis strategy that jointly ensures photorealism and semantic consistency.

Technology Category

Application Category

📝 Abstract
We present a framework for high-fidelity product image recontextualization using text-to-image diffusion models and a novel data augmentation pipeline. This pipeline leverages image-to-video diffusion, in/outpainting&negatives to create synthetic training data, addressing limitations of real-world data collection for this task. Our method improves the quality and diversity of generated images by disentangling product representations and enhancing the model's understanding of product characteristics. Evaluation on the ABO dataset and a private product dataset, using automated metrics and human assessment, demonstrates the effectiveness of our framework in generating realistic and compelling product visualizations, with implications for applications such as e-commerce and virtual product showcasing.
Problem

Research questions and friction points this paper is trying to address.

High-fidelity product image recontextualization using diffusion models.
Addressing limitations of real-world data collection with synthetic training data.
Improving quality and diversity of generated product visualizations for e-commerce.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-to-image diffusion models for recontextualization
Novel data augmentation with image-to-video diffusion
Disentangling product representations for enhanced quality
🔎 Similar Papers
No similar papers found.
I
Ishaan Malhi
Google DeepMind
P
Praneet Dutta
Google DeepMind
E
Ellie Talius
Google DeepMind
S
Sally Ma
Google DeepMind
B
Brendan Driscoll
Google
K
Krista Holden
Google
G
Garima Pruthi
Google
Arunachalam Narayanaswamy
Arunachalam Narayanaswamy
Software Engineer, Google Inc.
Computer visionImage analysisMachine learning