Coffee: Controllable Diffusion Fine-tuning

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Fine-tuning text-to-image diffusion models often erroneously associates user prompts with undesirable concepts, leading to uncontrollable generation. To address this, we propose an embedding-space regularization method that requires no additional training: it dynamically decouples specified harmful concepts from user prompts during fine-tuning, guided solely by natural language instructions. Our approach imposes editable semantic constraints directly within the model’s joint text–image embedding space, enabling real-time, flexible addition or removal of blocked concepts without architectural modification. Experiments across diverse interference scenarios demonstrate that our method significantly outperforms existing approaches—effectively suppressing spurious concept learning while preserving both customization fidelity and generalization capability. This enhances the safety and controllability of diffusion model fine-tuning.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models can generate diverse content with flexible prompts, which makes them well-suited for customization through fine-tuning with a small amount of user-provided data. However, controllable fine-tuning that prevents models from learning undesired concepts present in the fine-tuning data, and from entangling those concepts with user prompts, remains an open challenge. It is crucial for downstream tasks like bias mitigation, preventing malicious adaptation, attribute disentanglement, and generalizable fine-tuning of diffusion policy. We propose Coffee that allows using language to specify undesired concepts to regularize the adaptation process. The crux of our method lies in keeping the embeddings of the user prompt from aligning with undesired concepts. Crucially, Coffee requires no additional training and enables flexible modification of undesired concepts by modifying textual descriptions. We evaluate Coffee by fine-tuning on images associated with user prompts paired with undesired concepts. Experimental results demonstrate that Coffee can prevent text-to-image models from learning specified undesired concepts during fine-tuning and outperforms existing methods. Code will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Preventing diffusion models from learning undesired concepts during fine-tuning

Disentangling user prompts from unwanted concepts in adapted models

Enabling controllable customization through language-based regularization without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses language to specify undesired concepts

Prevents alignment of user prompts with undesired concepts

Enables flexible modification through textual descriptions

🔎 Similar Papers

Make Me Happier: Evoking Emotions Through Image Diffusion Models