🤖 AI Summary
This work addresses the challenge of dynamically aligning diffusion models with multiple user preferences—such as aesthetics, text fidelity, and image realism—during inference, where each preference exhibits distinct tolerance levels. We propose a zero-shot, fine-tuning-free multi-preference alignment paradigm. Methodologically, we model the reverse diffusion process as an interpolatable latent-space operation: multi-reward fusion is achieved via posterior sampling reweighting; a linearly mixed reverse process is introduced, with KL regularization strength explicitly parameterized to decouple reward optimization from regularization control. Our contributions include: (i) real-time specification of arbitrary linear reward combinations and KL intensities, enabling millisecond-level preference switching; (ii) single-model performance matching that of multiple specialized fine-tuned models; and (iii) significant improvements over single-objective fine-tuning and existing RLHF methods across multiple benchmarks, enhancing both controllability and deployment efficiency.
📝 Abstract
Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed KL regularization. However, this approach is inherently restrictive in practice, where alignment must balance multiple, often conflicting objectives. Moreover, user preferences vary across prompts, individuals, and deployment contexts, with varying tolerances for deviation from a pre-trained base model. We address the problem of inference-time multi-preference alignment: given a set of basis reward functions and a reference KL regularization strength, can we design a fine-tuning procedure so that, at inference time, it can generate images aligned with any user-specified linear combination of rewards and regularization, without requiring additional fine-tuning? We propose Diffusion Blend, a novel approach to solve inference-time multi-preference alignment by blending backward diffusion processes associated with fine-tuned models, and we instantiate this approach with two algorithms: DB-MPA for multi-reward alignment and DB-KLA for KL regularization control. Extensive experiments show that Diffusion Blend algorithms consistently outperform relevant baselines and closely match or exceed the performance of individually fine-tuned models, enabling efficient, user-driven alignment at inference-time. The code is available at https://github.com/bluewoods127/DB-2025}{github.com/bluewoods127/DB-2025.