🤖 AI Summary
This work addresses two key challenges: efficient sampling from unnormalized densities and reward-based fine-tuning of generative models. We propose Tilt Matching, a novel algorithm that reinterprets the flow matching velocity field as the sum of a stochastic interpolation term and the cumulant of a reward function. By formulating a reward-tilted stochastic optimal control problem via a dynamical equation, our method implicitly solves for the tilted velocity field—without requiring explicit reward gradients or trajectory backpropagation. Innovatively, we introduce a covariance-driven velocity correction and a reward-tilting mechanism, substantially reducing the variance of the objective. Empirically, Tilt Matching achieves state-of-the-art performance on Lennard-Jones potential sampling and matches top-tier methods in reward fine-tuning of Stable Diffusion—while eliminating the need for heuristic reward scaling and naturally supporting few-step flow mapping architectures.
📝 Abstract
We propose a simple, scalable algorithm for using stochastic interpolants to sample from unnormalized densities and for fine-tuning generative models. The approach, Tilt Matching, arises from a dynamical equation relating the flow matching velocity to one targeting the same distribution tilted by a reward, implicitly solving a stochastic optimal control problem. The new velocity inherits the regularity of stochastic interpolant transports while also being the minimizer of an objective with strictly lower variance than flow matching itself. The update to the velocity field can be interpreted as the sum of all joint cumulants of the stochastic interpolant and copies of the reward, and to first order is their covariance. The algorithms do not require any access to gradients of the reward or backpropagating through trajectories of the flow or diffusion. We empirically verify that the approach is efficient and highly scalable, providing state-of-the-art results on sampling under Lennard-Jones potentials and is competitive on fine-tuning Stable Diffusion, without requiring reward multipliers. It can also be straightforwardly applied to tilting few-step flow map models.