A General Framework for Inference-time Scaling and Steering of Diffusion Models

📅 2025-01-12

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

To address the challenge of simultaneously achieving high control precision, generation quality, and computational efficiency during inference in diffusion models, this paper introduces the Feynman–Kac (FK) steering framework—a training-free, gradient-free inference-time guidance paradigm. Grounded in path integral theory, FK steering models multi-dimensional rewards (e.g., prompt fidelity, safety, linguistic quality) over intermediate latent states via a potential function, and employs multi-particle sampling with importance reweighting for dynamic, scalable attribute control. Unlike conventional fine-tuning, it avoids mode collapse and eliminates costly retraining. Experiments demonstrate that, in text-to-image generation, a 0.8B-parameter model guided by FK steering significantly outperforms a fine-tuned 2.6B model in prompt fidelity; in text generation, it reduces perplexity, improves linguistic acceptability, and enables precise, gradient-free control over fine-grained attributes such as toxicity.

Technology Category

Application Category

📝 Abstract

Diffusion models produce impressive results in modalities ranging from images and video to protein design and text. However, generating samples with user-specified properties remains a challenge. Recent research proposes fine-tuning models to maximize rewards that capture desired properties, but these methods require expensive training and are prone to mode collapse. In this work, we propose Feynman Kac (FK) steering, an inference-time framework for steering diffusion models with reward functions. FK steering works by sampling a system of multiple interacting diffusion processes, called particles, and resampling particles at intermediate steps based on scores computed using functions called potentials. Potentials are defined using rewards for intermediate states and are selected such that a high value indicates that the particle will yield a high-reward sample. We explore various choices of potentials, intermediate rewards, and samplers. We evaluate FK steering on text-to-image and text diffusion models. For steering text-to-image models with a human preference reward, we find that FK steering a 0.8B parameter model outperforms a 2.6B parameter fine-tuned model on prompt fidelity, with faster sampling and no training. For steering text diffusion models with rewards for text quality and specific text attributes, we find that FK steering generates lower perplexity, more linguistically acceptable outputs and enables gradient-free control of attributes like toxicity. Our results demonstrate that inference-time scaling and steering of diffusion models, even with off-the-shelf rewards, can provide significant sample quality gains and controllability benefits. Code is available at https://github.com/zacharyhorvitz/Fk-Diffusion-Steering .

Problem

Research questions and friction points this paper is trying to address.

Diffusion Models

Control and Customization

High-Quality Content Generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Feynman-Kac Navigation

Real-time Guidance

Enhanced Control

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training