🤖 AI Summary
This work addresses the challenge that unconditional diffusion models struggle to generate samples whose aggregate attributes align with a specified target distribution—such as demographic or semantic proportions—during inference. The authors formalize this distribution alignment problem as an optimal control problem within the reverse diffusion process, introducing a time-varying additive perturbation as the control signal. This approach optimizes a differentiable distribution-matching objective while preserving data fidelity. Notably, the proposed plug-and-play method requires no retraining and can flexibly adapt to diverse target distributions at test time. Experimental results on image generation tasks demonstrate that the method significantly outperforms existing baselines in accurately aligning generated samples with desired attribute distributions.
📝 Abstract
Inference-time controllable generation is essential for real-world applications of unconditional diffusion models. However, most existing techniques focus on individual samples, struggling in applications that require the sample population to follow specific attribute distributions (e.g., demographic balance or semantic proportions). We formalize this setting as the inference-time attribute distributional alignment problem for pretrained unconditional diffusion models. To address this, we cast inference-time attribute distributional alignment as an optimal control problem over the reverse diffusion process, viewing the process as the rollout of a dynamical system and augmenting it with additive, time-dependent perturbations as control. We solve for the perturbations using an optimal-control-based algorithm to optimize a differentiable distribution-matching objective while penalizing control effort to preserve data fidelity. Experiment results in image generation demonstrate that our proposed plug-and-play approach can better align attribute distributions to diverse and flexible test-time targets compared to baselines, without retraining or finetuning the pretrained diffusion model.