Vector Quantization using Gaussian Variational Autoencoder

📅 2025-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
VQ-VAEs suffer from training instability due to hard discrete quantization. To address this, we propose Gaussian Quantization (GQ), the first framework establishing theoretical equivalence between Gaussian VAEs with Target Divergence Constraint (TDC) and VQ-VAEs—enabling direct, zero-shot conversion of pre-trained Gaussian VAEs into efficient discrete encoders. We prove that, when the logarithm of the codebook size exceeds the bits-back rate, the quantization error is bounded. GQ integrates TDC-regularized optimization, stochastic noise-based codebook initialization, nearest-neighbor quantization, and bits-back coding. Extensive experiments on UNet and ViT backbones demonstrate that GQ consistently outperforms state-of-the-art discrete autoencoders—including VQGAN, FSQ, LFQ, and BSQ—across reconstruction fidelity, perceptual quality, and downstream task performance. Moreover, TDC significantly boosts baseline models such as TokenBridge. Our implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Vector quantized variational autoencoder (VQ-VAE) is a discrete auto-encoder that compresses images into discrete tokens. It is difficult to train due to discretization. In this paper, we propose a simple yet effective technique, dubbed Gaussian Quant (GQ), that converts a Gaussian VAE with certain constraint into a VQ-VAE without training. GQ generates random Gaussian noise as a codebook and finds the closest noise to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian VAE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian VAE for effective GQ, named target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-VAEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves upon previous Gaussian VAE discretization methods, such as TokenBridge. The source code is provided in https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE.
Problem

Research questions and friction points this paper is trying to address.

Proposes Gaussian Quant to convert Gaussian VAE into VQ-VAE without training
Addresses training difficulty of VQ-VAE due to discretization with a simple technique
Improves performance over prior VQ-VAE methods on UNet and ViT architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Quant converts Gaussian VAE to VQ-VAE without training
Generates random Gaussian noise as a codebook for quantization
Target divergence constraint trains Gaussian VAE for effective quantization