Stein's Lemma for the Reparameterization Trick with Exponential Family Mixtures

📅 2019-10-29
🏛️ arXiv.org
📈 Citations: 29
Influential: 6
📄 PDF
🤖 AI Summary
Gradient estimation for non-Gaussian distributions in variational inference and generative modeling suffers from low accuracy due to reliance on sampling-based or numerical approximations. Method: We systematically generalize Stein’s lemma to the generalized exponential family mixture distributions—including full-covariance Gaussians, t-distributions, skew Gaussians, and normal-inverse Gaussian distributions—thereby extending beyond the classical restriction to standard Gaussians. Integrating exponential family theory, probabilistic differential operator analysis, and reparameterization techniques, we derive exact analytical expressions for expected gradients under weak regularity conditions and obtain explicit Stein gradient identities for multiple non-standard distributions. Contribution/Results: Our approach eliminates the need for Monte Carlo sampling or numerical differentiation, ensuring theoretical rigor while substantially improving gradient estimation accuracy and broadening the class of admissible distributions. This provides a novel, differentiable tool for efficient variational learning and differentiable generative modeling of complex probabilistic models.
📝 Abstract
Stein's method (Stein, 1973; 1981) is a powerful tool for statistical applications, and has had a significant impact in machine learning. Stein's lemma plays an essential role in Stein's method. Previous applications of Stein's lemma either required strong technical assumptions or were limited to Gaussian distributions with restricted covariance structures. In this work, we extend Stein's lemma to exponential-family mixture distributions including Gaussian distributions with full covariance structures. Our generalization enables us to establish a connection between Stein's lemma and the reparamterization trick to derive gradients of expectations of a large class of functions under weak assumptions. Using this connection, we can derive many new reparameterizable gradient-identities that goes beyond the reach of existing works. For example, we give gradient identities when expectation is taken with respect to Student's t-distribution, skew Gaussian, exponentially modified Gaussian, and normal inverse Gaussian.
Problem

Research questions and friction points this paper is trying to address.

Stein's Lemma
Exponential Family Models
Complex Gaussian Distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stein's Lemma
Exponential Family Models
Gaussian Distributions
W
Wu Lin
University of British Columbia, Vancouver, Canada
M
M. E. Khan
RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
Mark Schmidt
Mark Schmidt
Professor of Computer Science, University of British Columbia
Machine LearningOptimization