Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of dynamically aligning diffusion models with multiple user preferences—such as aesthetics, text fidelity, and image realism—during inference, where each preference exhibits distinct tolerance levels. We propose a zero-shot, fine-tuning-free multi-preference alignment paradigm. Methodologically, we model the reverse diffusion process as an interpolatable latent-space operation: multi-reward fusion is achieved via posterior sampling reweighting; a linearly mixed reverse process is introduced, with KL regularization strength explicitly parameterized to decouple reward optimization from regularization control. Our contributions include: (i) real-time specification of arbitrary linear reward combinations and KL intensities, enabling millisecond-level preference switching; (ii) single-model performance matching that of multiple specialized fine-tuned models; and (iii) significant improvements over single-objective fine-tuning and existing RLHF methods across multiple benchmarks, enhancing both controllability and deployment efficiency.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed KL regularization. However, this approach is inherently restrictive in practice, where alignment must balance multiple, often conflicting objectives. Moreover, user preferences vary across prompts, individuals, and deployment contexts, with varying tolerances for deviation from a pre-trained base model. We address the problem of inference-time multi-preference alignment: given a set of basis reward functions and a reference KL regularization strength, can we design a fine-tuning procedure so that, at inference time, it can generate images aligned with any user-specified linear combination of rewards and regularization, without requiring additional fine-tuning? We propose Diffusion Blend, a novel approach to solve inference-time multi-preference alignment by blending backward diffusion processes associated with fine-tuned models, and we instantiate this approach with two algorithms: DB-MPA for multi-reward alignment and DB-KLA for KL regularization control. Extensive experiments show that Diffusion Blend algorithms consistently outperform relevant baselines and closely match or exceed the performance of individually fine-tuned models, enabling efficient, user-driven alignment at inference-time. The code is available at https://github.com/bluewoods127/DB-2025}{github.com/bluewoods127/DB-2025.
Problem

Research questions and friction points this paper is trying to address.

Align diffusion models with multiple conflicting objectives
Enable user-specified reward combinations at inference time
Balance preferences without additional fine-tuning per case
Innovation

Methods, ideas, or system contributions that make the work stand out.

Blends backward diffusion processes for alignment
Enables multi-reward alignment at inference-time
Controls KL regularization without fine-tuning
🔎 Similar Papers
No similar papers found.
Min Cheng
Min Cheng
TEXAS A&M UNIVERSITY
reinforcement learningoptimizationdiffusion model
F
Fatemeh Doudi
Texas A&M University
D
D. Kalathil
Texas A&M University
Mohammad Ghavamzadeh
Mohammad Ghavamzadeh
Amazon AGI
Reinforcement LearningOnline LearningMachine LearningControlAI
P
Panganamala R. Kumar
Texas A&M University