CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the degradation in speech clarity and intelligibility caused by the loss of high-frequency components in low-bandwidth audio. Existing approaches often compromise either computational efficiency or reconstruction fidelity. To overcome these limitations, this study introduces conditional flow matching into the compact latent space of a neural audio codec for the first time, proposing a voicing-aware conditional flow transformer combined with a structurally constrained residual vector quantization mechanism. This enables efficient, end-to-end speech bandwidth extension. The method significantly enhances the stability of latent variable alignment and achieves superior high-frequency reconstruction, spectral fidelity, and perceptual audio quality in both 8 kHz-to-16 kHz and 8 kHz-to-44.1 kHz bandwidth extension tasks.

Technology Category

Application Category

📝 Abstract
Speech Bandwidth Extension improves clarity and intelligibility by restoring/inferring appropriate high-frequency content for low-bandwidth speech. Existing methods often rely on spectrogram or waveform modeling, which can incur higher computational cost and have limited high-frequency fidelity. Neural audio codecs offer compact latent representations that better preserve acoustic detail, yet accurately recovering high-resolution latent information remains challenging due to representation mismatch. We present CodecFlow, a neural codec-based BWE framework that performs efficient speech reconstruction in a compact latent space. CodecFlow employs a voicing-aware conditional flow converter on continuous codec embeddings and a structure-constrained residual vector quantizer to improve latent alignment stability. Optimized end-to-end, CodecFlow achieves strong spectral fidelity and enhanced perceptual quality on 8 kHz to 16 kHz and 44.1 kHz speech BWE tasks.
Problem

Research questions and friction points this paper is trying to address.

Bandwidth Extension
Neural Codec
Latent Space
High-Frequency Reconstruction
Representation Mismatch
Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional flow matching
neural audio codec
bandwidth extension
latent space modeling
voicing-aware conversion
🔎 Similar Papers
No similar papers found.