Projected Gradient Ascent for Efficient Reward-Guided Updates with One-Step Generative Models

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the prevalent issues of reward hacking and inefficient optimization in test-time reward-guided generation by proposing a latent-space optimization method grounded in a hard constraint that enforces white Gaussian noise structure. Instead of conventional soft regularization, the approach explicitly preserves the white Gaussian noise property of the latent vector through a closed-form projection applied after each projected gradient ascent update. This strategy effectively prevents degradation in generation quality and eliminates reward hacking, with negligible additional computational overhead. Experimental results demonstrate that the method achieves comparable aesthetic scores to the current state-of-the-art while requiring only 30% of its runtime, thereby substantially improving both optimization efficiency and reliability.

Technology Category

Application Category

📝 Abstract
We propose a constrained latent optimization method for reward-guided generation that preserves white Gaussian noise characteristics with negligible overhead. Test-time latent optimization can unlock substantially better reward-guided generations from pretrained generative models, but it is prone to reward hacking that degrades quality and also too slow for practical use. In this work, we make test-time optimization both efficient and reliable by replacing soft regularization with hard white Gaussian noise constraints enforced via projected gradient ascent. Our method applies a closed-form projection after each update to keep the latent vector explicitly noise-like throughout optimization, preventing the drift that leads to unrealistic artifacts. This enforcement adds minimal cost: the projection matches the $O(N \log N)$ complexity of standard algorithms such as sorting or FFT and does not practically increase wall-clock time. In experiments, our approach reaches a comparable Aesthetic Score using only 30% of the wall-clock time required by the SOTA regularization-based method, while preventing reward hacking.
Problem

Research questions and friction points this paper is trying to address.

reward-guided generation
reward hacking
test-time optimization
latent optimization
generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

projected gradient ascent
white Gaussian noise constraint
reward-guided generation
latent optimization
reward hacking prevention
🔎 Similar Papers
No similar papers found.
J
Jisung Hwang
KAIST, School of Computing, Daejeon, South Korea
Minhyuk Sung
Minhyuk Sung
KAIST
Computer GraphicsComputer VisionGeometry Processing3D Deep Learning