Iterative Tilting for Diffusion Fine-Tuning

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

265K/year

🤖 AI Summary

This work addresses the challenge of efficiently adapting diffusion models to reward distributions under gradient-free conditions. We propose Iterative Reward Tilting (IRT), a gradient-free fine-tuning method that decomposes large reward tilting into a sequence of small, incremental adjustments. Each iteration relies solely on forward evaluations of the reward function and a first-order Taylor expansion to update the score function—eliminating the need for backpropagation or derivative computation along sampling trajectories. As the first iterative tilting framework supporting closed-form verification, provably stable convergence, and zero gradient backpropagation, IRT achieves exact convergence to the theoretical optimum in both 2D Gaussian mixture and linear reward settings. Empirical results demonstrate substantial improvements in fine-tuning efficiency and numerical stability compared to existing gradient-free approaches.

Technology Category

Application Category

📝 Abstract

We introduce iterative tilting, a gradient-free method for fine-tuning diffusion models toward reward-tilted distributions. The method decomposes a large reward tilt $exp(lambda r)$ into $N$ sequential smaller tilts, each admitting a tractable score update via first-order Taylor expansion. This requires only forward evaluations of the reward function and avoids backpropagating through sampling chains. We validate on a two-dimensional Gaussian mixture with linear reward, where the exact tilted distribution is available in closed form.

Problem

Research questions and friction points this paper is trying to address.

Fine-tuning diffusion models without gradients

Decomposing large reward tilts into sequential steps

Using forward reward evaluations to avoid backpropagation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient-free fine-tuning via iterative tilting

Decomposes large reward tilt into sequential smaller tilts

Uses forward reward evaluations, avoids backpropagation through sampling

🔎 Similar Papers

No similar papers found.