🤖 AI Summary
This paper investigates the non-asymptotic statistical properties of Contrastive Divergence (CD) for training unnormalized models. The central question is whether CD achieves the optimal parametric convergence rate $O(n^{-1/2})$ and whether its asymptotic variance approaches the Cramér–Rao lower bound. Focusing on exponential-family models under standard regularity conditions, the authors systematically analyze CD under both online (full-batch) and mini-batch settings. They provide the first rigorous non-asymptotic proof that CD attains the $O(n^{-1/2})$ rate and show that the gap between its asymptotic variance and the Cramér–Rao bound decays at rate $o(1)$. These results establish CD’s dual optimality—both statistically efficient and practically feasible—and deliver foundational theoretical support for parameter estimation in generative modeling within the non-asymptotic regime.
📝 Abstract
We perform a non-asymptotic analysis of the contrastive divergence (CD) algorithm, a training method for unnormalized models. While prior work has established that (for exponential family distributions) the CD iterates asymptotically converge at an $O(n^{-1 / 3})$ rate to the true parameter of the data distribution, we show, under some regularity assumptions, that CD can achieve the parametric rate $O(n^{-1 / 2})$. Our analysis provides results for various data batching schemes, including the fully online and minibatch ones. We additionally show that CD can be near-optimal, in the sense that its asymptotic variance is close to the Cramér-Rao lower bound.