Personalized Preference Fine-tuning of Diffusion Models

📅 2025-01-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-image diffusion models (e.g., DPO) model only population-level preference distributions, failing to capture individual user preferences—thus limiting personalized image generation. Method: We propose PPD, a Personalized Preference Distillation framework, introducing the first multi-reward alignment paradigm for diffusion models. PPD encodes user-specific preferences via vision-language models and injects them into the diffusion process through cross-attention mechanisms, followed by DPO-style multi-reward fine-tuning. It enables few-shot preference embedding extraction from just four paired samples and supports cross-user generalization and inference-time multi-preference interpolation. Results: Evaluated on Stable Cascade, PPD achieves a 76% win rate in human preference evaluation, significantly improving consistency with individual preferences and controllability of generated images.

Technology Category

Application Category

📝 Abstract
RLHF techniques like DPO can significantly improve the generation quality of text-to-image diffusion models. However, these methods optimize for a single reward that aligns model generation with population-level preferences, neglecting the nuances of individual users' beliefs or values. This lack of personalization limits the efficacy of these models. To bridge this gap, we introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences. With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way, enabling generalization to unseen users. Specifically, our approach (1) leverages a vision-language model (VLM) to extract personal preference embeddings from a small set of pairwise preference examples, and then (2) incorporates the embeddings into diffusion models through cross attention. Conditioning on user embeddings, the text-to-image models are fine-tuned with the DPO objective, simultaneously optimizing for alignment with the preferences of multiple users. Empirical results demonstrate that our method effectively optimizes for multiple reward functions and can interpolate between them during inference. In real-world user scenarios, with as few as four preference examples from a new user, our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Models
Personalization
Text-to-Image
Innovation

Methods, ideas, or system contributions that make the work stand out.

PPD method
VLM model
personalized image generation
🔎 Similar Papers
No similar papers found.