๐ค AI Summary
Personalized text-to-image diffusion models struggle to simultaneously preserve subject fidelity and support fine-grained textual editing. Method: We propose a learning-free, low-overhead personalized guidance approach that introduces a pretrained weak model conditioned on the null prompt. By dynamically interpolating weights between this weak model and the fine-tuned model in latent space, our method explicitly balances subject fidelity and text alignment, synergistically integrating classifier-free guidance and self-guidance. Contribution/Results: The approach requires no additional parameters or training and is compatible with diverse fine-tuning strategies. Experiments demonstrate that it preserves the base modelโs broad textual understanding while significantly improving subject fidelity and textโimage alignment, enabling high-quality, highly controllable personalized image generation.
๐ Abstract
Personalizing text-to-image diffusion models is crucial for adapting the pre-trained models to specific target concepts, enabling diverse image generation. However, fine-tuning with few images introduces an inherent trade-off between aligning with the target distribution (e.g., subject fidelity) and preserving the broad knowledge of the original model (e.g., text editability). Existing sampling guidance methods, such as classifier-free guidance (CFG) and autoguidance (AG), fail to effectively guide the output toward well-balanced space: CFG restricts the adaptation to the target distribution, while AG compromises text alignment. To address these limitations, we propose personalization guidance, a simple yet effective method leveraging an unlearned weak model conditioned on a null text prompt. Moreover, our method dynamically controls the extent of unlearning in a weak model through weight interpolation between pre-trained and fine-tuned models during inference. Unlike existing guidance methods, which depend solely on guidance scales, our method explicitly steers the outputs toward a balanced latent space without additional computational overhead. Experimental results demonstrate that our proposed guidance can improve text alignment and target distribution fidelity, integrating seamlessly with various fine-tuning strategies.