Enhancing Vector Quantization with Distributional Matching: A Theoretical and Empirical Study

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vector quantization (VQ) methods suffer from two key challenges: training instability—caused by gradient mismatch due to the straight-through estimator (STE)—and codebook collapse, stemming from poor codevector utilization. Both issues arise fundamentally from misalignment between the input feature distribution and the codebook distribution. This paper proposes a Wasserstein-distance-based distribution alignment optimization framework—the first to incorporate the Wasserstein distance directly into the VQ objective—to jointly mitigate gradient mismatch and underutilization from a distribution-matching perspective. We provide theoretical guarantees on convergence and quantization error bounds, and design a differentiable distribution alignment mechanism alongside an improved STE. Experiments demonstrate near-perfect codebook utilization (~100%), significantly reduced quantization error, and consistent improvements in reconstruction quality and training stability across multiple autoregressive modeling tasks.

Technology Category

Application Category

📝 Abstract
The success of autoregressive models largely depends on the effectiveness of vector quantization, a technique that discretizes continuous features by mapping them to the nearest code vectors within a learnable codebook. Two critical issues in existing vector quantization methods are training instability and codebook collapse. Training instability arises from the gradient discrepancy introduced by the straight-through estimator, especially in the presence of significant quantization errors, while codebook collapse occurs when only a small subset of code vectors are utilized during training. A closer examination of these issues reveals that they are primarily driven by a mismatch between the distributions of the features and code vectors, leading to unrepresentative code vectors and significant data information loss during compression. To address this, we employ the Wasserstein distance to align these two distributions, achieving near 100% codebook utilization and significantly reducing the quantization error. Both empirical and theoretical analyses validate the effectiveness of the proposed approach.
Problem

Research questions and friction points this paper is trying to address.

Addresses training instability in vector quantization methods
Resolves codebook collapse by improving code vector utilization
Reduces quantization error via feature-code distribution alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Wasserstein distance for distribution alignment
Achieves near 100% codebook utilization
Reduces quantization error significantly
🔎 Similar Papers
No similar papers found.
X
Xianghong Fang
University of Toronto
L
Litao Guo
The Hong Kong University of Science and Technology
Hengchao Chen
Hengchao Chen
University of Toronto
manifold statisticsoptimization
Y
Yuxuan Zhang
University of Toronto
X
Xiaofan Xia
University of Toronto
Dingjie Song
Dingjie Song
Lehigh University; CUHK-Shenzhen; Nanjing University
Multimodal LearningLarge Language Models
Yexin Liu
Yexin Liu
The Hong Kong University of Science and Technology
AIGC
H
Hao Wang
Southern University of Science and Technology
Harry Yang
Harry Yang
HKUST
computer visionmachine learning
Y
Yuan Yuan
Boston College
Q
Qiang Sun
University of Toronto