🤖 AI Summary
Diffusion models suffer from weak controllability in practical tasks and heavily rely on gradient-based optimization or fine-tuning. To address this, we propose CoDe—a gradient-free, inference-time control method that requires neither differentiable reward functions nor model fine-tuning. CoDe introduces a novel block-level denoising intervention mechanism: during intermediate denoising steps, it applies non-parametric reward alignment in a partitioned, chunk-wise manner, enabling dynamic and lightweight generation control. Crucially, CoDe operates entirely without backpropagation or retraining, preserving the original sampling efficiency while significantly improving alignment across multiple objectives—including text fidelity and reward optimization. Experiments demonstrate that CoDe achieves state-of-the-art performance on diverse controllable generation benchmarks, reduces inference overhead by over 30%, and offers superior efficiency, flexibility, and deployability.
📝 Abstract
Aligning diffusion models to downstream tasks often requires finetuning new models or gradient-based guidance at inference time to enable sampling from the reward-tilted posterior. In this work, we explore a simple inference-time gradient-free guidance approach, called controlled denoising (CoDe), that circumvents the need for differentiable guidance functions and model finetuning. CoDe is a blockwise sampling method applied during intermediate denoising steps, allowing for alignment with downstream rewards. Our experiments demonstrate that, despite its simplicity, CoDe offers a favorable trade-off between reward alignment, prompt instruction following, and inference cost, achieving a competitive performance against the state-of-the-art baselines. Our code is available at: https://github.com/anujinho/code.