Hierarchical Variational Policies for Reward-Guided Diffusion

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
This work addresses the high computational cost of reward-guided sampling with pretrained diffusion models for downstream inverse problems. The authors propose a novel paradigm that integrates fully amortized and semi-amortized inference by formulating test-time adaptation as a hierarchical variational model. A lightweight stochastic policy is introduced to enable efficient control over amortization, facilitating large-step, few-step sampling. This approach substantially reduces inference cost while preserving high-quality generation. Experiments on tasks such as 4× super-resolution demonstrate over a 5× speedup in inference time, with perceptual quality matching or exceeding that of current state-of-the-art methods, thereby effectively overcoming the efficiency bottleneck of conventional test-time optimization strategies.
📝 Abstract
Adapting pretrained diffusion models to downstream objectives such as inverse problems often requires expensive test-time guidance or optimization. We propose a principled framework for generating high-quality reward-aligned samples at substantially reduced inference cost. Our approach formulates test-time adaptation as a hierarchical variational model, where control is amortized into a lightweight yet expressive stochastic policy. This formulation naturally supports few-step diffusion sampling: large step sizes enable fast inference, while the learned policy maintains sample quality by providing structured per-step control. The resulting fully amortized sampler achieves a strong quality--speed tradeoff, matching or exceeding recent test-time scaling baselines while requiring significantly less compute. For example, on 4x super-resolution, our method achieves better perceptual quality with more than 5x faster inference compared to the best-performing baseline. We further extend our approach to a semi-amortized regime that combines cheap amortized proposals with limited test-time optimization, achieving state-of-the-art perceptual quality across several challenging inverse problems.
Problem

Research questions and friction points this paper is trying to address.

diffusion models
inverse problems
test-time adaptation
reward-guided generation
inference efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical variational policy
reward-guided diffusion
amortized inference
few-step diffusion sampling
test-time adaptation