Diffusion Alignment as Variational Expectation-Maximization

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

In diffusion-based alignment, reinforcement learning or direct gradient optimization often leads to reward over-optimization and mode collapse. To address this, we propose a variational Expectation-Maximization (EM) framework that models alignment as an alternating iterative process: the E-step performs test-time search to generate high-reward, diverse samples, while the M-step fine-tunes the model by optimizing a variational lower bound. This approach explicitly balances reward maximization and diversity preservation without compromising generation quality. It unifies treatment across both continuous (e.g., text-to-image) and discrete (e.g., DNA sequence design) generative tasks. Experiments demonstrate that our method significantly mitigates mode collapse across multiple downstream benchmarks, achieving a more robust trade-off between reward and diversity. The results validate both its effectiveness and broad generalizability.

Technology Category

Application Category

📝 Abstract

Diffusion alignment aims to optimize diffusion models for the downstream objective. While existing methods based on reinforcement learning or direct backpropagation achieve considerable success in maximizing rewards, they often suffer from reward over-optimization and mode collapse. We introduce Diffusion Alignment as Variational Expectation-Maximization (DAV), a framework that formulates diffusion alignment as an iterative process alternating between two complementary phases: the E-step and the M-step. In the E-step, we employ test-time search to generate diverse and reward-aligned samples. In the M-step, we refine the diffusion model using samples discovered by the E-step. We demonstrate that DAV can optimize reward while preserving diversity for both continuous and discrete tasks: text-to-image synthesis and DNA sequence design.

Problem

Research questions and friction points this paper is trying to address.

Optimizing diffusion models for downstream objectives while preserving diversity

Addressing reward over-optimization and mode collapse in diffusion alignment

Developing variational EM framework for text-to-image and DNA sequence tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses variational expectation-maximization for diffusion alignment

Alternates test-time search with model refinement steps

Optimizes rewards while preserving output diversity

🔎 Similar Papers

No similar papers found.