VP-VAE: Rethinking Vector Quantization via Adaptive Vector Perturbation

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the training instability and codebook collapse in vector-quantized variational autoencoders (VQ-VAEs), which arise from the tight coupling between representation learning and codebook optimization. To resolve this, the authors propose the VP-VAE framework, which decouples the quantization operation by modeling it as an adaptive perturbation in the latent space, thereby eliminating the need for an explicit codebook. Leveraging Metropolis–Hastings sampling, the method generates distribution-consistent and scale-adaptive perturbations. Under the assumption of uniformly distributed latent variables, a lightweight variant termed FSP is derived, offering both a unified theoretical interpretation and practical enhancements for fixed quantizers. Experiments demonstrate that the proposed approach significantly improves reconstruction fidelity on image and audio tasks, promotes more balanced token usage, and enhances training stability and robustness.

Technology Category

Application Category

📝 Abstract

Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental to modern generative modeling, yet they often suffer from training instability and "codebook collapse" due to the inherent coupling of representation learning and discrete codebook optimization. In this paper, we propose VP-VAE (Vector Perturbation VAE), a novel paradigm that decouples representation learning from discretization by eliminating the need for an explicit codebook during training. Our key insight is that, from the neural network's viewpoint, performing quantization primarily manifests as injecting a structured perturbation in latent space. Accordingly, VP-VAE replaces the non-differentiable quantizer with distribution-consistent and scale-adaptive latent perturbations generated via Metropolis--Hastings sampling. This design enables stable training without a codebook while making the model robust to inference-time quantization error. Moreover, under the assumption of approximately uniform latent variables, we derive FSP (Finite Scalar Perturbation), a lightweight variant of VP-VAE that provides a unified theoretical explanation and a practical improvement for FSQ-style fixed quantizers. Extensive experiments on image and audio benchmarks demonstrate that VP-VAE and FSP improve reconstruction fidelity and achieve substantially more balanced token usage, while avoiding the instability inherent to coupled codebook training.

Problem

Research questions and friction points this paper is trying to address.

Vector Quantization

VQ-VAE

codebook collapse

training instability

discrete representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vector Perturbation

Codebook-free Quantization

Metropolis-Hastings Sampling

Latent Space Perturbation

Disentangled Representation Learning

🔎 Similar Papers

Optimal and Near-Optimal Adaptive Vector Quantization

2024-02-05arXiv.orgCitations: 4

Authors to Follow