PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Residual vector quantization (RVQ) in neural speech codecs suffers from training instability and inefficient residual decomposition, limiting reconstruction quality and robustness. To address this, we propose Progressive Residual Entropy Quantization (PREQ), a novel framework that leverages a pre-trained speech enhancement model as a guidance signal to orchestrate multi-stage residual quantization: it first quantizes low-entropy components to ensure foundational fidelity, then progressively models high-entropy details. This enables entropy-aware hierarchical quantization learning, substantially improving training stability and rate-distortion performance. Experiments demonstrate that PREQ consistently outperforms standard RVQ under both clean and noisy conditions, achieving significant gains in objective speech reconstruction metrics (e.g., MCD, PESQ) and downstream speech synthesis tasks—particularly enhancing noise-robustness.

Technology Category

Application Category

📝 Abstract
Neural speech codecs have achieved strong performance in low-bitrate compression, but residual vector quantization (RVQ) often suffers from unstable training and ineffective decomposition, limiting reconstruction quality and efficiency. We propose PURE Codec (Progressive Unfolding of Residual Entropy), a novel framework that guides multi-stage quantization using a pre-trained speech enhancement model. The first quantization stage reconstructs low-entropy, denoised speech embeddings, while subsequent stages encode residual high-entropy components. This design improves training stability significantly. Experiments demonstrate that PURE consistently outperforms conventional RVQ-based codecs in reconstruction and downstream speech language model-based text-to-speech, particularly under noisy training conditions.
Problem

Research questions and friction points this paper is trying to address.

Improves training stability of neural speech codecs
Enhances reconstruction quality under noisy conditions
Optimizes multi-stage quantization using speech enhancement guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained speech enhancement model for quantization guidance
First stage reconstructs low-entropy denoised speech embeddings
Subsequent stages encode residual high-entropy components
🔎 Similar Papers
No similar papers found.