Zero-Variance Gradients for Variational Autoencoders

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high-variance gradient estimation problem in variational autoencoder (VAE) training—arising from stochastic sampling of latent variables—this paper introduces the “Silent Gradient” method. It constructs specialized decoder architectures (e.g., linear or analytically tractable forms) to enable closed-form analytical computation of the evidence lower bound (ELBO) expectation, yielding zero-variance gradients. A progressive annealing strategy is further proposed to dynamically integrate these exact gradients into training deep nonlinear models, preserving representational capacity while substantially improving optimization stability. Experiments across multiple benchmark datasets demonstrate consistent superiority over mainstream gradient estimators—including the reparameterization trick, Gumbel-Softmax, and REINFORCE—in terms of convergence speed, final likelihood, and robustness to hyperparameter choices. The method thus offers enhanced effectiveness, robustness, and generalizability for VAE-based learning.

Technology Category

Application Category

📝 Abstract
Training deep generative models like Variational Autoencoders (VAEs) is often hindered by the need to backpropagate gradients through the stochastic sampling of their latent variables, a process that inherently introduces estimation variance, which can slow convergence and degrade performance. In this paper, we propose a new perspective that sidesteps this problem, which we call Silent Gradients. Instead of improving stochastic estimators, we leverage specific decoder architectures to analytically compute the expected ELBO, yielding a gradient with zero variance. We first provide a theoretical foundation for this method and demonstrate its superiority over existing estimators in a controlled setting with a linear decoder. To generalize our approach for practical use with complex, expressive decoders, we introduce a novel training dynamic that uses the exact, zero-variance gradient to guide the early stages of encoder training before annealing to a standard stochastic estimator. Our experiments show that this technique consistently improves the performance of established baselines, including reparameterization, Gumbel-Softmax, and REINFORCE, across multiple datasets. This work opens a new direction for training generative models by combining the stability of analytical computation with the expressiveness of deep, nonlinear architecture.
Problem

Research questions and friction points this paper is trying to address.

Eliminate gradient variance in VAE training
Compute expected ELBO analytically for zero-variance gradients
Improve convergence and performance of deep generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analytical ELBO computation for zero-variance gradients
Silent Gradients bypass stochastic sampling issues
Annealing training dynamic combines exact and stochastic gradients
🔎 Similar Papers
No similar papers found.