🤖 AI Summary
Variational inference (VI) is widely employed in variational autoencoders (VAEs) and diffusion models, yet it is interpreted divergently across disciplines: statisticians view it primarily as Bayesian posterior approximation, whereas machine learning practitioners deploy it within a frequentist framework for maximum likelihood estimation—creating conceptual barriers.
Method: This paper establishes, for the first time, a rigorous frequentist theory of VI, unifying VAEs and denoising diffusion models as neural-parameterized VI instances under latent-variable models, with the EM algorithm serving as the foundational conceptual anchor and all derivations strictly grounded in maximum likelihood estimation.
Contribution/Results: The work bridges a long-standing theoretical gap between classical statistical inference and deep generative modeling. It provides statisticians a prior-free, interpretable pathway to understand modern generative models and formally grounds generative modeling in frequentist principles—thereby establishing a coherent statistical foundation for VI-based deep generative methods.
📝 Abstract
While Variational Inference (VI) is central to modern generative models like Variational Autoencoders (VAEs) and Denoising Diffusion Models (DDMs), its pedagogical treatment is split across disciplines. In statistics, VI is typically framed as a Bayesian method for posterior approximation. In machine learning, however, VAEs and DDMs are developed from a Frequentist viewpoint, where VI is used to approximate a maximum likelihood estimator. This creates a barrier for statisticians, as the principles behind VAEs and DDMs are hard to contextualize without a corresponding Frequentist introduction to VI. This paper provides that introduction: we explain the theory for VI, VAEs, and DDMs from a purely Frequentist perspective, starting with the classical Expectation-Maximization (EM) algorithm. We show how VI arises as a scalable solution for intractable E-steps and how VAEs and DDMs are natural, deep-learning-based extensions of this framework, thereby bridging the gap between classical statistical inference and modern generative AI.