🤖 AI Summary
In diffusion models, existing denoising score identity (DSI) and target score identity (TSI) exhibit an inherent variance trade-off: DSI suffers from high variance in the low-noise regime, whereas TSI incurs high variance in the high-noise regime.
Method: We propose the Control-Variable Score Identity (CVSI), the first framework unifying data-driven and energy-function-driven score estimation paradigms. CVSI introduces a time-varying optimal control coefficient derived from the control variate method and score matching principles, achieving theoretical variance minimization across all noise scales—without requiring additional data or architectural modifications.
Contribution/Results: CVSI significantly reduces variance throughout sampling, enhancing sample efficiency for data-free samplers both during training and inference. Empirical evaluation demonstrates consistent superiority over DSI and TSI baselines across diverse noise levels, validating its robustness and generality.
📝 Abstract
Diffusion models offer a robust framework for sampling from unnormalized probability densities, which requires accurately estimating the score of the noise-perturbed target distribution. While the standard Denoising Score Identity (DSI) relies on data samples, access to the target energy function enables an alternative formulation via the Target Score Identity (TSI). However, these estimators face a fundamental variance trade-off: DSI exhibits high variance in low-noise regimes, whereas TSI suffers from high variance at high noise levels. In this work, we reconcile these approaches by unifying both estimators within the principled framework of control variates. We introduce the Control Variate Score Identity (CVSI), deriving an optimal, time-dependent control coefficient that theoretically guarantees variance minimization across the entire noise spectrum. We demonstrate that CVSI serves as a robust, low-variance plug-in estimator that significantly enhances sample efficiency in both data-free sampler learning and inference-time diffusion sampling.