🤖 AI Summary
Existing generative models often suffer from inefficiency when guided by user-specified rewards—such as aesthetic quality or human preferences—due to computationally expensive procedures or multi-step approximations. This work reframes the guidance problem as a deterministic optimal control task and, for the first time, naturally integrates flow matching into its solution framework, yielding a training-free, single-trajectory guidance method. Requiring only three function evaluations (NFEs), the proposed approach achieves high-quality alignment in text-to-image generation and matches or surpasses state-of-the-art methods across diverse settings, including inverse problems, style transfer, human preference optimization, and VLM-based rewards. It accelerates inference by over an order of magnitude while providing a unified theoretical foundation that subsumes existing guidance algorithms.
📝 Abstract
In generative modeling, we often wish to produce samples that maximize a user-specified reward such as aesthetic quality or alignment with human preferences, a problem known as guidance. Despite their widespread use, existing guidance methods either require expensive multi-particle, many-step schemes or rely on poorly understood approximations. We reformulate guidance as a deterministic optimal control problem, yielding a hierarchy of algorithms that subsumes existing approaches at the coarsest level. We show that the flow map, an object of significant recent interest for its role in fast inference, arises naturally in the optimal solution. Based on this observation, we propose Flow Map Reward Guidance (FMRG): a training-free, single-trajectory framework that uses the flow map to both integrate and guide the flow. At text-to-image scale, FMRG matches or surpasses baselines across inverse problems, style transfer, human preferences, and VLM rewards with as few as 3 NFEs, giving at least an order-of-magnitude speedup in comparison to prior state of the art.