HyperVQ: MLR-based Vector Quantization in Hyperbolic Space

📅 2024-03-18
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Euclidean vector quantization (VQ) for visual/auditory continuous signals suffers from codebook collapse, poor cluster separability, and low packing efficiency. Method: This paper proposes HyperVQ—the first VQ framework formulated as a multinomial logistic regression problem in hyperbolic space, using the Poincaré ball model and geometric hyperplane parameterization to represent codebook vectors. By leveraging the exponential volume growth of hyperbolic geometry, HyperVQ intrinsically enhances semantic disentanglement and codebook utilization, fundamentally mitigating collapse. Results: Experiments show that HyperVQ matches Euclidean VQ-VAEs in generative and reconstruction tasks while significantly outperforming them in discriminative tasks. Its codebook exhibits greater compactness and clearer semantic separation, establishing a robust, efficient new paradigm for discrete representation learning of continuous signals.

Technology Category

Application Category

📝 Abstract
The success of models operating on tokenized data has heightened the need for effective tokenization methods, particularly in vision and auditory tasks where inputs are naturally continuous. A common solution is to employ Vector Quantization (VQ) within VQ Variational Autoencoders (VQVAEs), transforming inputs into discrete tokens by clustering embeddings in Euclidean space. However, Euclidean embeddings not only suffer from inefficient packing and limited separation - due to their polynomial volume growth - but are also prone to codebook collapse, where only a small subset of codebook vectors are effectively utilized. To address these limitations, we introduce HyperVQ, a novel approach that formulates VQ as a hyperbolic Multinomial Logistic Regression (MLR) problem, leveraging the exponential volume growth in hyperbolic space to mitigate collapse and improve cluster separability. Additionally, HyperVQ represents codebook vectors as geometric representatives of hyperbolic decision hyperplanes, encouraging disentangled and robust latent representations. Our experiments demonstrate that HyperVQ matches traditional VQ in generative and reconstruction tasks, while surpassing it in discriminative performance and yielding a more efficient and disentangled codebook.
Problem

Research questions and friction points this paper is trying to address.

Improves tokenization for continuous data in vision and auditory tasks
Addresses codebook collapse and inefficient packing in Euclidean VQ
Enhances cluster separability and disentangled representations via hyperbolic space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyperbolic space for efficient vector quantization
MLR-based clustering to prevent codebook collapse
Geometric hyperbolic decision hyperplanes for robustness
🔎 Similar Papers
No similar papers found.