DreamArtist++: Controllable One-Shot Text-to-Image Generation via Positive-Negative Adapter

📅 2022-11-21
📈 Citations: 21
Influential: 1
📄 PDF
🤖 AI Summary
To address the inherent trade-off between feature fidelity and generation diversity in text-guided image synthesis from a single reference image, this paper proposes a novel positive–negative adapter collaborative tuning framework. Integrated into pre-trained diffusion models (e.g., Stable Diffusion), it enables one-shot, high-fidelity, and highly controllable style or concept transfer. Methodologically, we introduce the first positive–negative prompt tuning mechanism: a positive adapter actively extracts salient features from the reference image to steer diverse generation, while a negative adapter dynamically suppresses undesired biases; this is further enhanced by contrastive prompt optimization and latent-space feature disentanglement. Extensive experiments demonstrate state-of-the-art performance across style cloning, concept composition, and prompt editing tasks. Quantitatively, our method improves image fidelity, generation diversity, and control accuracy by 12.6%, 9.3%, and 18.4%, respectively, over prior approaches.
📝 Abstract
State-of-the-arts text-to-image generation models such as Imagen and Stable Diffusion Model have succeed remarkable progresses in synthesizing high-quality, feature-rich images with high resolution guided by human text prompts. Since certain characteristics of image content emph{e.g.}, very specific object entities or styles, are very hard to be accurately described by text, some example-based image generation approaches have been proposed, emph{i.e.} generating new concepts based on absorbing the salient features of a few input references. Despite of acknowledged successes, these methods have struggled on accurately capturing the reference examples' characteristics while keeping diverse and high-quality image generation, particularly in the one-shot scenario (emph{i.e.} given only one reference). To tackle this problem, we propose a simple yet effective framework, namely DreamArtist, which adopts a novel positive-negative prompt-tuning learning strategy on the pre-trained diffusion model, and it has shown to well handle the trade-off between the accurate controllability and fidelity of image generation with only one reference example. Specifically, our proposed framework incorporates both positive and negative embeddings or adapters and optimizes them in a joint manner. The positive part aggressively captures the salient characteristics of the reference image to drive diversified generation and the negative part rectifies inadequacies from the positive part. We have conducted extensive experiments and evaluated the proposed method from image similarity (fidelity) and diversity, generation controllability, and style cloning. And our DreamArtist has achieved a superior generation performance over existing methods. Besides, our additional evaluation on extended tasks, including concept compositions and prompt-guided image editing, demonstrates its effectiveness for more applications.
Problem

Research questions and friction points this paper is trying to address.

Accurately capturing reference characteristics in one-shot image generation
Balancing generation fidelity and diversity with single reference
Improving controllability of text-to-image models for specific concepts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Positive-negative prompt-tuning learning strategy
Joint optimization of positive and negative embeddings
One-shot reference image feature capture
🔎 Similar Papers
No similar papers found.