Variable Bitrate Residual Vector Quantization for Audio Coding

📅 2024-10-08
🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural audio compression models employ residual vector quantization (RVQ) with a fixed number of codebooks, limiting optimal rate-distortion trade-offs—particularly wasting bits on low-complexity frames such as silence. To address this, we propose Variable-Rate Residual Vector Quantization (VRVQ), the first RVQ framework enabling frame-level adaptive selection of the number of active codebooks, dynamically allocating bitrates according to local audio complexity. To enable end-to-end differentiable training, we introduce an importance-mask-based gradient estimation technique, integrated with the straight-through estimator (STE) to handle the non-differentiability of codebook activation/deactivation. Experiments demonstrate that VRVQ achieves significantly improved reconstruction quality at equivalent bitrates, outperforming state-of-the-art baselines and establishing new performance records for neural audio codecs.

Technology Category

Application Category

📝 Abstract
Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly in scenarios with simple input audio, such as silence. To address this limitation, we propose variable bitrate RVQ (VRVQ) for audio codecs, which allows for more efficient coding by adapting the number of codebooks used per frame. Furthermore, we propose a gradient estimation method for the non-differentiable masking operation that transforms from the importance map to the binary importance mask, improving model training via a straight-through estimator. We demonstrate that the proposed training framework achieves superior results compared to the baseline method and shows further improvement when applied to the current state-of-the-art codec.
Problem

Research questions and friction points this paper is trying to address.

Adapting codebook count per frame for efficient audio coding
Improving gradient estimation for non-differentiable masking operations
Enhancing rate-distortion tradeoff in neural audio compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Variable bitrate RVQ adapts codebooks per frame
Gradient estimation for non-differentiable masking operation
Straight-through estimator improves model training
🔎 Similar Papers
No similar papers found.