Lookahead Sample Reward Guidance for Test-Time Scaling of Diffusion Models

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the misalignment between diffusion model outputs and human intent, a challenge exacerbated by the high computational cost of existing gradient-based guidance methods. The authors propose LiDAR sampling, the first closed-form reward-guided approach that operates without neural backpropagation. By estimating Expected Future Reward (EFR) from boundary samples of a pre-trained diffusion model and integrating lookahead sampling with a high-precision ODE solver, LiDAR efficiently steers generation toward high-reward outcomes. Remarkably, using only three samples and a three-step lookahead, LiDAR achieves GenEval scores on par with state-of-the-art gradient-guided methods on SDXL while accelerating inference by 9.5×.

Technology Category

Application Category

📝 Abstract
Diffusion models have demonstrated strong generative performance; however, generated samples often fail to fully align with human intent. This paper studies a test-time scaling method that enables sampling from regions with higher human-aligned reward values. Existing gradient guidance methods approximate the expected future reward (EFR) at an intermediate particle $\mathbf{x}_t$ using a Taylor approximation, but this approximation at each time step incurs high computational cost due to sequential neural backpropagation. We show that the EFR at any $\mathbf{x}_t$ can be computed using only marginal samples from a pre-trained diffusion model. The proposed EFR formulation detaches the neural dependency between $\mathbf{x}_t$ and the EFR, enabling closed-form guidance computation without neural backpropagation. To further improve efficiency, we introduce lookahead sampling to collect marginal samples. For final sample generation, we use an accurate solver that guides particles toward high-reward lookahead samples. We refer to this sampling scheme as LiDAR sampling. LiDAR achieves substantial performance improvements using only three samples with a 3-step lookahead solver, exhibiting steep performance gains as lookahead accuracy and sample count increase; notably, it reaches the same GenEval performance as the latest gradient guidance method for SDXL with a 9.5x speedup.
Problem

Research questions and friction points this paper is trying to address.

diffusion models
test-time scaling
reward guidance
human alignment
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models
test-time scaling
reward guidance
lookahead sampling
closed-form guidance
🔎 Similar Papers
No similar papers found.
Yeongmin Kim
Yeongmin Kim
KAIST
Generative ModelsMachine Learning
D
Donghyeok Shin
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Byeonghu Na
Byeonghu Na
KAIST
Generative ModelDiffusion Model
M
Minsang Park
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
R
Richard Lee Kim
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Il-Chul Moon
Il-Chul Moon
Professor, Department of Industrial and Systems Engineering, KAIST
Modeling and SimulationArtificial Intelligence