Steering Guidance for Personalized Text-to-Image Diffusion Models

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Personalized text-to-image diffusion models struggle to simultaneously preserve subject fidelity and support fine-grained textual editing. Method: We propose a learning-free, low-overhead personalized guidance approach that introduces a pretrained weak model conditioned on the null prompt. By dynamically interpolating weights between this weak model and the fine-tuned model in latent space, our method explicitly balances subject fidelity and text alignment, synergistically integrating classifier-free guidance and self-guidance. Contribution/Results: The approach requires no additional parameters or training and is compatible with diverse fine-tuning strategies. Experiments demonstrate that it preserves the base model’s broad textual understanding while significantly improving subject fidelity and text–image alignment, enabling high-quality, highly controllable personalized image generation.

Technology Category

Application Category

📝 Abstract

Personalizing text-to-image diffusion models is crucial for adapting the pre-trained models to specific target concepts, enabling diverse image generation. However, fine-tuning with few images introduces an inherent trade-off between aligning with the target distribution (e.g., subject fidelity) and preserving the broad knowledge of the original model (e.g., text editability). Existing sampling guidance methods, such as classifier-free guidance (CFG) and autoguidance (AG), fail to effectively guide the output toward well-balanced space: CFG restricts the adaptation to the target distribution, while AG compromises text alignment. To address these limitations, we propose personalization guidance, a simple yet effective method leveraging an unlearned weak model conditioned on a null text prompt. Moreover, our method dynamically controls the extent of unlearning in a weak model through weight interpolation between pre-trained and fine-tuned models during inference. Unlike existing guidance methods, which depend solely on guidance scales, our method explicitly steers the outputs toward a balanced latent space without additional computational overhead. Experimental results demonstrate that our proposed guidance can improve text alignment and target distribution fidelity, integrating seamlessly with various fine-tuning strategies.

Problem

Research questions and friction points this paper is trying to address.

Balancing target distribution alignment and original model knowledge retention

Overcoming limitations of existing sampling guidance methods

Dynamic control of unlearning extent for balanced latent space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses unlearned weak model with null text

Dynamically controls unlearning via weight interpolation

Balances latent space without extra computation

🔎 Similar Papers

No similar papers found.