🤖 AI Summary
Existing reward alignment algorithms rely on SDE-based sampling to realize Markov transitions, improving alignment between flow matching and diffusion models but suffering from significantly lower computational efficiency than ODE-based sampling—introducing an inherent trade-off between stochasticity and inference speed. This work proposes GLASS Flows, a nested flow-matching framework that implicitly constructs “flow matching within flow matching” inside pre-trained models, enabling extraction of inner structures without retraining. It thus unifies the computational efficiency of ODE solvers with the stochastic evolution capability of SDEs. Integrated with Feynman–Kac steering for reward-aligned optimization, GLASS Flows offers a plug-and-play, inference-time extension. Evaluated on large-scale text-to-image generation, it achieves state-of-the-art performance, substantially improving both sample quality and sampling efficiency.
📝 Abstract
The performance of flow matching and diffusion models can be greatly improved at inference time using reward alignment algorithms, yet efficiency remains a major limitation. While several algorithms were proposed, we demonstrate that a common bottleneck is the sampling method these algorithms rely on: many algorithms require to sample Markov transitions via SDE sampling, which is significantly less efficient and often less performant than ODE sampling. To remove this bottleneck, we introduce GLASS Flows, a new sampling paradigm that simulates a "flow matching model within a flow matching model" to sample Markov transitions. As we show in this work, this "inner" flow matching model can be retrieved from a pre-trained model without any re-training, combining the efficiency of ODEs with the stochastic evolution of SDEs. On large-scale text-to-image models, we show that GLASS Flows eliminate the trade-off between stochastic evolution and efficiency. Combined with Feynman-Kac Steering, GLASS Flows improve state-of-the-art performance in text-to-image generation, making it a simple, drop-in solution for inference-time scaling of flow and diffusion models.