🤖 AI Summary
To address the high-variance gradient estimation problem in variational autoencoder (VAE) training—arising from stochastic sampling of latent variables—this paper introduces the “Silent Gradient” method. It constructs specialized decoder architectures (e.g., linear or analytically tractable forms) to enable closed-form analytical computation of the evidence lower bound (ELBO) expectation, yielding zero-variance gradients. A progressive annealing strategy is further proposed to dynamically integrate these exact gradients into training deep nonlinear models, preserving representational capacity while substantially improving optimization stability. Experiments across multiple benchmark datasets demonstrate consistent superiority over mainstream gradient estimators—including the reparameterization trick, Gumbel-Softmax, and REINFORCE—in terms of convergence speed, final likelihood, and robustness to hyperparameter choices. The method thus offers enhanced effectiveness, robustness, and generalizability for VAE-based learning.
📝 Abstract
Training deep generative models like Variational Autoencoders (VAEs) is often hindered by the need to backpropagate gradients through the stochastic sampling of their latent variables, a process that inherently introduces estimation variance, which can slow convergence and degrade performance. In this paper, we propose a new perspective that sidesteps this problem, which we call Silent Gradients. Instead of improving stochastic estimators, we leverage specific decoder architectures to analytically compute the expected ELBO, yielding a gradient with zero variance. We first provide a theoretical foundation for this method and demonstrate its superiority over existing estimators in a controlled setting with a linear decoder. To generalize our approach for practical use with complex, expressive decoders, we introduce a novel training dynamic that uses the exact, zero-variance gradient to guide the early stages of encoder training before annealing to a standard stochastic estimator. Our experiments show that this technique consistently improves the performance of established baselines, including reparameterization, Gumbel-Softmax, and REINFORCE, across multiple datasets. This work opens a new direction for training generative models by combining the stability of analytical computation with the expressiveness of deep, nonlinear architecture.