Context Diffusion: In-Context Aware Image Generation

๐Ÿ“… 2023-12-06
๐Ÿ›๏ธ European Conference on Computer Vision
๐Ÿ“ˆ Citations: 14
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing context-based image generation models heavily rely on textual prompts; in their absence, image quality and contextual fidelity degrade significantly, hindering pure visual analogical learning. Method: We propose the first few-shot in-context image generation framework that operates entirely without textโ€”introducing a dual-path diffusion encoder that disentangles visual context and query representations. It employs cross-sample attention and conditional feature alignment to separately model semantic context and geometric structure. Contribution/Results: Our method achieves analogical image generation using only example images, requiring no textual prompts whatsoever. It substantially outperforms baselines in both in-domain and out-of-domain tasks regarding image fidelity and semantic consistency. A user study confirms superior perceptual quality, marking the first demonstration of reliable and efficient zero-text-prompt visual in-context learning.
๐Ÿ“ Abstract
We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and fidelity of the generated images deteriorate when the prompt is not present, demonstrating that these models are unable to truly learn from the visual context. To address this, we propose a novel framework that separates the encoding of the visual context and preserving the structure of the query images. This results in the ability to learn from the visual context and text prompts, but also from either one of them. Furthermore, we enable our model to handle few-shot settings, to effectively address diverse in-context learning scenarios. Our experiments and user study demonstrate that Context Diffusion excels in both in-domain and out-of-domain tasks, resulting in an overall enhancement in image quality and fidelity compared to counterpart models.
Problem

Research questions and friction points this paper is trying to address.

Enables image generation from visual context without text prompts
Improves quality and fidelity in few-shot in-context learning
Separates visual context encoding and image layout preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Separates visual context and image layout encoding
Enables learning from visual context or prompts
Supports few-shot in-context learning scenarios
๐Ÿ”Ž Similar Papers
No similar papers found.