PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Residual vector quantization (RVQ) in neural speech codecs suffers from training instability and inefficient residual decomposition, limiting reconstruction quality and robustness. To address this, we propose Progressive Residual Entropy Quantization (PREQ), a novel framework that leverages a pre-trained speech enhancement model as a guidance signal to orchestrate multi-stage residual quantization: it first quantizes low-entropy components to ensure foundational fidelity, then progressively models high-entropy details. This enables entropy-aware hierarchical quantization learning, substantially improving training stability and rate-distortion performance. Experiments demonstrate that PREQ consistently outperforms standard RVQ under both clean and noisy conditions, achieving significant gains in objective speech reconstruction metrics (e.g., MCD, PESQ) and downstream speech synthesis tasks—particularly enhancing noise-robustness.

Technology Category

Application Category

📝 Abstract

Neural speech codecs have achieved strong performance in low-bitrate compression, but residual vector quantization (RVQ) often suffers from unstable training and ineffective decomposition, limiting reconstruction quality and efficiency. We propose PURE Codec (Progressive Unfolding of Residual Entropy), a novel framework that guides multi-stage quantization using a pre-trained speech enhancement model. The first quantization stage reconstructs low-entropy, denoised speech embeddings, while subsequent stages encode residual high-entropy components. This design improves training stability significantly. Experiments demonstrate that PURE consistently outperforms conventional RVQ-based codecs in reconstruction and downstream speech language model-based text-to-speech, particularly under noisy training conditions.

Problem

Research questions and friction points this paper is trying to address.

Improves training stability of neural speech codecs

Enhances reconstruction quality under noisy conditions

Optimizes multi-stage quantization using speech enhancement guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained speech enhancement model for quantization guidance

First stage reconstructs low-entropy denoised speech embeddings

Subsequent stages encode residual high-entropy components

🔎 Similar Papers

No similar papers found.