Pinterest Canvas: Large-Scale Image Generation at Pinterest

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

General-purpose image generation models often fall short in meeting product-level demands for high controllability and customized editing. To address this limitation, this work proposes a rapid adaptation paradigm based on a unified multimodal foundational diffusion model: first pretraining a base model on large-scale, high-quality data, then efficiently fine-tuning task-specific models for downstream applications such as background enhancement and aspect-ratio outpainting. This approach effectively balances generation quality with product-oriented controllability, significantly improving user experience in real-world deployments—background enhancement and aspect-ratio outpainting yield 18.0% and 12.5% increases in user engagement, respectively. Human evaluations confirm superior performance over existing third-party models, and the framework successfully extends to emerging scenarios including multi-image composition and image-to-video generation.

Technology Category

Application Category

📝 Abstract

While recent image generation models demonstrate a remarkable ability to handle a wide variety of image generation tasks, this flexibility makes them hard to control via prompting or simple inference adaptation alone, rendering them unsuitable for use cases with strict product requirements. In this paper, we introduce Pinterest Canvas, our large-scale image generation system built to support image editing and enhancement use cases at Pinterest. Canvas is first trained on a diverse, multimodal dataset to produce a foundational diffusion model with broad image-editing capabilities. However, rather than relying on one generic model to handle every downstream task, we instead rapidly fine-tune variants of this base model on task-specific datasets, producing specialized models for individual use cases. We describe key components of Canvas and summarize our best practices for dataset curation, training, and inference. We also showcase task-specific variants through case studies on background enhancement and aspect-ratio outpainting, highlighting how we tackle their specific product requirements. Online A/B experiments demonstrate that our enhanced images receive a significant 18.0% and 12.5% engagement lift, respectively, and comparisons with human raters further validate that our models outperform third-party models on these tasks. Finally, we showcase other Canvas variants, including multi-image scene synthesis and image-to-video generation, demonstrating that our approach can generalize to a wide variety of potential downstream tasks.

Problem

Research questions and friction points this paper is trying to address.

image generation

controllability

product requirements

image editing

large-scale systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

task-specific fine-tuning

diffusion model

image editing