Steering Guidance for Personalized Text-to-Image Diffusion Models

๐Ÿ“… 2025-08-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Personalized text-to-image diffusion models struggle to simultaneously preserve subject fidelity and support fine-grained textual editing. Method: We propose a learning-free, low-overhead personalized guidance approach that introduces a pretrained weak model conditioned on the null prompt. By dynamically interpolating weights between this weak model and the fine-tuned model in latent space, our method explicitly balances subject fidelity and text alignment, synergistically integrating classifier-free guidance and self-guidance. Contribution/Results: The approach requires no additional parameters or training and is compatible with diverse fine-tuning strategies. Experiments demonstrate that it preserves the base modelโ€™s broad textual understanding while significantly improving subject fidelity and textโ€“image alignment, enabling high-quality, highly controllable personalized image generation.

Technology Category

Application Category

๐Ÿ“ Abstract
Personalizing text-to-image diffusion models is crucial for adapting the pre-trained models to specific target concepts, enabling diverse image generation. However, fine-tuning with few images introduces an inherent trade-off between aligning with the target distribution (e.g., subject fidelity) and preserving the broad knowledge of the original model (e.g., text editability). Existing sampling guidance methods, such as classifier-free guidance (CFG) and autoguidance (AG), fail to effectively guide the output toward well-balanced space: CFG restricts the adaptation to the target distribution, while AG compromises text alignment. To address these limitations, we propose personalization guidance, a simple yet effective method leveraging an unlearned weak model conditioned on a null text prompt. Moreover, our method dynamically controls the extent of unlearning in a weak model through weight interpolation between pre-trained and fine-tuned models during inference. Unlike existing guidance methods, which depend solely on guidance scales, our method explicitly steers the outputs toward a balanced latent space without additional computational overhead. Experimental results demonstrate that our proposed guidance can improve text alignment and target distribution fidelity, integrating seamlessly with various fine-tuning strategies.
Problem

Research questions and friction points this paper is trying to address.

Balancing target distribution alignment and original model knowledge retention
Overcoming limitations of existing sampling guidance methods
Dynamic control of unlearning extent for balanced latent space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses unlearned weak model with null text
Dynamically controls unlearning via weight interpolation
Balances latent space without extra computation
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Sunghyun Park
Qualcomm AI Research
Seokeon Choi
Seokeon Choi
Qualcomm AI research
Computer visionMachine learningImage generationDomain generalizationPerson re-identification
H
Hyoungwoo Park
Qualcomm AI Research
S
Sungrack Yun
Qualcomm AI Research