Fine-Tuning Diffusion Models via Intermediate Distribution Shaping

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of downstream task misalignment and intractable marginal likelihood estimation in diffusion model fine-tuning, this paper proposes P-GRAFT—a partial-model fine-tuning method that shapes the latent distribution at intermediate noise levels, avoiding full-model retraining. Its core innovation is an implicit inverse-noise correction mechanism that improves generation quality without explicit reward signals, theoretically grounded in a bias-variance trade-off analysis for distributional control. Built upon the RAFT framework, P-GRAFT integrates policy gradient principles with a resampling strategy to enable coordinated noise-level adaptation across multiple stages. Experiments on Stable Diffusion 2 demonstrate that P-GRAFT achieves an 8.81% relative improvement in VQAScore for text-to-image generation over baselines—outperforming existing policy-gradient methods—and attains superior FID in unconditional generation at lower computational cost. The method exhibits strong generalizability and efficiency across diverse tasks, including text-to-image, layout-to-image, molecular generation, and unconditional synthesis.

Technology Category

Application Category

📝 Abstract
Diffusion models are widely used for generative tasks across domains. While pre-trained diffusion models effectively capture the training data distribution, it is often desirable to shape these distributions using reward functions to align with downstream applications. Policy gradient methods, such as Proximal Policy Optimization (PPO), are widely used in the context of autoregressive generation. However, the marginal likelihoods required for such methods are intractable for diffusion models, leading to alternative proposals and relaxations. In this context, we unify variants of Rejection sAmpling based Fine-Tuning (RAFT) as GRAFT, and show that this implicitly performs PPO with reshaped rewards. We then introduce P-GRAFT to shape distributions at intermediate noise levels and demonstrate empirically that this can lead to more effective fine-tuning. We mathematically explain this via a bias-variance tradeoff. Motivated by this, we propose inverse noise correction to improve flow models without leveraging explicit rewards. We empirically evaluate our methods on text-to-image(T2I) generation, layout generation, molecule generation and unconditional image generation. Notably, our framework, applied to Stable Diffusion 2, improves over policy gradient methods on popular T2I benchmarks in terms of VQAScore and shows an $8.81%$ relative improvement over the base model. For unconditional image generation, inverse noise correction improves FID of generated images at lower FLOPs/image.
Problem

Research questions and friction points this paper is trying to address.

Fine-tuning diffusion models using reward functions for downstream applications
Addressing intractable likelihoods in policy gradient methods for diffusion
Improving generation quality via intermediate noise distribution shaping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning diffusion models via intermediate distribution shaping
Introducing P-GRAFT for shaping intermediate noise levels
Proposing inverse noise correction to improve flow models
🔎 Similar Papers
No similar papers found.
G
Gautham Govind Anil
Google DeepMind
S
Shaan Ul Haque
Georgia Institute of Technology
Nithish Kannen
Nithish Kannen
Google DeepMind
Deep LearningReasoningMultimodalityText-to-Image
Dheeraj Nagaraj
Dheeraj Nagaraj
Research Scientist, Google
StatisticsMachine Learning
S
Sanjay Shakkottai
University of Texas at Austin
K
Karthikeyan Shanmugam
Google DeepMind