Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

📅 2024-11-04
🏛️ arXiv.org
📈 Citations: 17
Influential: 0
📄 PDF
🤖 AI Summary
Vector quantization (VQ) in unsupervised learning often suffers from representation collapse, leading to low codebook utilization and degenerate latent spaces, thereby limiting model scalability. This paper proposes SimVQ, a novel VQ variant that reparameterizes the entire codebook via a learnable linear layer, shifting the optimization objective from selecting a single nearest-codebook vector to projecting onto the linear subspace spanned by the codebook. Designed through rigorous theoretical analysis, SimVQ integrates seamlessly into standard VQ frameworks without requiring auxiliary regularization or dimensionality reduction. Evaluated on multimodal image and audio tasks, SimVQ introduces only a lightweight linear transformation yet achieves substantial improvements in codebook utilization and downstream performance while effectively mitigating collapse. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Vector Quantization (VQ) is a widely used method for converting continuous representations into discrete codes, which has become fundamental in unsupervised representation learning and latent generative models. However, VQ models are often hindered by the problem of representation collapse in the latent space, which leads to low codebook utilization and limits the scalability of the codebook for large-scale training. Existing methods designed to mitigate representation collapse typically reduce the dimensionality of latent space at the expense of model capacity, which do not fully resolve the core issue. In this study, we conduct a theoretical analysis of representation collapse in VQ models and identify its primary cause as the disjoint optimization of the codebook, where only a small subset of code vectors are updated through gradient descent. To address this issue, we propose extbf{SimVQ}, a novel method which reparameterizes the code vectors through a linear transformation layer based on a learnable latent basis. This transformation optimizes the extit{entire linear space} spanned by the codebook, rather than merely updating extit{the code vector} selected by the nearest-neighbor search in vanilla VQ models. Although it is commonly understood that the multiplication of two linear matrices is equivalent to applying a single linear layer, our approach works surprisingly well in resolving the collapse issue in VQ models with just one linear layer. We validate the efficacy of SimVQ through extensive experiments across various modalities, including image and audio data with different model architectures. Our code is available at url{https://github.com/youngsheen/SimVQ}.
Problem

Research questions and friction points this paper is trying to address.

Prevent representation collapse in Vector Quantized models
Improve codebook utilization without reducing model capacity
Optimize entire linear space instead of individual code vectors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reparameterizes code vectors via linear transformation
Optimizes entire linear space not individual vectors
SimpleVQ prevents collapse with one linear layer
🔎 Similar Papers
No similar papers found.
Yongxin Zhu
Yongxin Zhu
Professor, Chinese Academy of Sciences; Adjunct Professor, Shanghai Jiao Tong University;
zhuyongxin@sari.ac.cnzhuyongxin@sjtu.edu.cn
B
Bocheng Li
Univeristy of Science and Technology of China
Yifei Xin
Yifei Xin
Peking University
L
Linli Xu
Univeristy of Science and Technology of China