🤖 AI Summary
Existing vector quantization (VQ) generative models rely on fixed codebooks, resulting in inflexible bitrates, the need for repeated retraining, and a fundamental trade-off between compression efficiency and reconstruction fidelity. This work proposes a multi-rate codebook adaptation framework that, for the first time, enables a single pre-trained VQ model to generate discrete representations at arbitrary bitrates without retraining. Our approach comprises two key innovations: (1) a data-driven mechanism for generating multi-rate codebooks, and (2) a lightweight adaptation method for pre-trained VQ models, leveraging hierarchical clustering and codebook embedding interpolation. Experiments demonstrate consistent and significant improvements over fixed-codebook baselines across diverse bitrates. The framework supports continuous, fine-grained rate-distortion control, substantially enhancing the generalizability, deployment flexibility, and inference efficiency of VQ models in practical applications.
📝 Abstract
Learning discrete representations with vector quantization (VQ) has emerged as a powerful approach in various generative models. However, most VQ-based models rely on a single, fixed-rate codebook, requiring extensive retraining for new bitrates or efficiency requirements. We introduce Rate-Adaptive Quantization (RAQ), a multi-rate codebook adaptation framework for VQ-based generative models. RAQ applies a data-driven approach to generate variable-rate codebooks from a single baseline VQ model, enabling flexible tradeoffs between compression and reconstruction fidelity. Additionally, we provide a simple clustering-based procedure for pre-trained VQ models, offering an alternative when retraining is infeasible. Our experiments show that RAQ performs effectively across multiple rates, often outperforming conventional fixed-rate VQ baselines. By enabling a single system to seamlessly handle diverse bitrate requirements, RAQ extends the adaptability of VQ-based generative models and broadens their applicability to data compression, reconstruction, and generation tasks.