🤖 AI Summary
This paper addresses the fundamental mismatch between the continuous latent space of standard Variational Autoencoders (VAEs) and the inherently discrete nature of data such as text. To resolve this, we propose the Discrete VAE—a VAE explicitly designed for categorical latent variables. Methodologically, we derive the evidence lower bound (ELBO) rigorously from first principles of variational inference under categorical latents and employ the Gumbel-Softmax reparameterization to enable differentiable gradient estimation in discrete latent spaces. Our key contributions are threefold: (1) a tutorial-style, unified theoretical framework for discrete VAEs; (2) a robust and reproducible training paradigm; and (3) publicly released, fully functional code. Experiments demonstrate that the Discrete VAE significantly improves interpretability and structural coherence in discrete data generation, outperforming continuous-latent baselines while preserving principled probabilistic modeling.
📝 Abstract
Variational Autoencoders (VAEs) are well-established as a principled approach to probabilistic unsupervised learning with neural networks. Typically, an encoder network defines the parameters of a Gaussian distributed latent space from which we can sample and pass realizations to a decoder network. This model is trained to reconstruct its inputs and is optimized through the evidence lower bound. In recent years, discrete latent spaces have grown in popularity, suggesting that they may be a natural choice for many data modalities (e.g. text). In this tutorial, we provide a rigorous, yet practical, introduction to discrete variational autoencoders -- specifically, VAEs in which the latent space is made up of latent variables that follow a categorical distribution. We assume only a basic mathematical background with which we carefully derive each step from first principles. From there, we develop a concrete training recipe and provide an example implementation, hosted at https://github.com/alanjeffares/discreteVAE.