🤖 AI Summary
This work addresses the limitation of conventional VQ-VAE models in learning rich and discriminative discrete representations due to constrained codebook capacity. To overcome this, the authors propose a Spherical Angular Margin Prior (SAMP), which enhances angular separation and promotes uniform spherical coverage among codebook vectors through a ball-bounded norm constraint and an arccosine additive margin loss. This approach yields a novel spherical vector quantization framework that significantly improves representation diversity, reconstruction fidelity, and sample quality in image reconstruction and generation tasks. Empirical results demonstrate that the proposed method either surpasses or matches the performance of current state-of-the-art baselines.
📝 Abstract
Vector Quantized Variational Autoencoder (VQ-VAE) has become a fundamental framework for learning discrete representations in image modeling. However, VQ-VAE models must tokenize entire images using a finite set of codebook vectors, and this capacity limitation restricts their ability to capture rich and diverse representations. In this paper, we propose ArcCosine Additive Margin VQ-VAE (ArcVQ-VAE), a novel vector quantization framework that introduces a spherical angular-margin prior (SAMP) for the codebook of a conventional VQ-VAE. The proposed SAMP consists of Ball-Bounded Norm Regularization, which constrains all codebook vectors within a time-dependent Euclidean ball, and ArcCosine Additive Margin Loss, which encourages greater angular separability among latent vectors. This formulation promotes more discriminative and uniformly dispersed latent representations within the constrained space, thereby improving effective latent-space coverage and leading to improved codebook utilization. Experimental results on standard image reconstruction and generation tasks show that ArcVQ-VAE achieves competitive performance against baseline models in terms of reconstruction accuracy, representation diversity, and sample quality. The code is available at: https://github.com/goals4292/ArcVQ-VAE