🤖 AI Summary
To address rigid bit-rate allocation and redundant bit consumption by noise components in residual vector quantization (RVQ) speech coding under realistic noisy conditions, this paper proposes a variable-bit-rate RVQ (VRVQ) framework. Our method introduces: (1) the first noise-aware, dynamic frame-level bit-rate allocation mechanism, which adaptively adjusts quantization precision based on speech saliency and noise intensity; and (2) the first end-to-end jointly optimized neural feature-domain denoiser integrated into the RVQ quantization loop. Under rate-distortion joint optimization, VRVQ significantly improves coding efficiency across diverse noise scenarios: at equal bit rates, it achieves a mean opinion score (MOS) gain of ≥0.8 over baseline methods, with superior speech intelligibility and subjective quality compared to constant-bit-rate RVQ (CBR-RVQ) and conventional codecs.
📝 Abstract
Residual Vector Quantization (RVQ) has become a dominant approach in neural speech and audio coding, providing high-fidelity compression. However, speech coding presents additional challenges due to real-world noise, which degrades compression efficiency. Standard codecs allocate bits uniformly, wasting bitrate on noise components that do not contribute to intelligibility. This paper introduces a Variable Bitrate RVQ (VRVQ) framework for noise-robust speech coding, dynamically adjusting bitrate per frame to optimize rate-distortion trade-offs. Unlike constant bitrate (CBR) RVQ, our method prioritizes critical speech components while suppressing residual noise. Additionally, we integrate a feature denoiser to further improve noise robustness. Experimental results show that VRVQ improves rate-distortion trade-offs over conventional methods, achieving better compression efficiency and perceptual quality in noisy conditions. Samples are available at our project page: https://yoongi43.github.io/noise_robust_vrvq/.