Improving Test-Time Performance of RVQ-based Neural Codecs

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RVQ-based neural audio codecs suffer from high quantization error and limited synthesis quality during inference due to fixed codebooks. Method: This paper proposes a test-time dynamic code selection algorithm that requires no retraining. At inference, it jointly optimizes discrete codebook indices across all RVQ levels via hierarchical codebook search and layer-wise quantization error minimization, overcoming the suboptimality of conventional greedy encoding. Contribution/Results: The core innovation lies in formulating codebook selection as a differentiable path optimization problem—preserving discrete constraints while enabling end-to-end gradient backpropagation of reconstruction error. Experiments demonstrate significant improvements in both objective metrics (e.g., LSD, MRSTFT) and subjective MOS scores, with an average 23.6% reduction in quantization error. The method is fully compatible with existing RVQ models and incurs negligible deployment overhead.

Technology Category

Application Category

📝 Abstract
The residual vector quantization (RVQ) technique plays a central role in recent advances in neural audio codecs. These models effectively synthesize high-fidelity audio from a limited number of codes due to the hierarchical structure among quantization levels. In this paper, we propose an encoding algorithm to further enhance the synthesis quality of RVQ-based neural codecs at test-time. Firstly, we point out the suboptimal nature of quantized vectors generated by conventional methods. We demonstrate that quantization error can be mitigated by selecting a different set of codes. Subsequently, we present our encoding algorithm, designed to identify a set of discrete codes that achieve a lower quantization error. We then apply the proposed method to pre-trained models and evaluate its efficacy using diverse metrics. Our experimental findings validate that our method not only reduces quantization errors, but also improves synthesis quality.
Problem

Research questions and friction points this paper is trying to address.

Enhancing synthesis quality of RVQ-based neural audio codecs
Reducing quantization errors in residual vector quantization systems
Improving test-time performance of existing pre-trained codec models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances RVQ codecs with improved encoding algorithm
Reduces quantization error by selecting optimal code sets
Improves synthesis quality in pre-trained neural models
🔎 Similar Papers
2024-10-08IEEE International Conference on Acoustics, Speech, and Signal ProcessingCitations: 0