FlowDec: A flow-based full-band general audio codec with high perceptual quality

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses high-fidelity, full-band neural audio coding for 48 kHz general-purpose audio at extremely low bitrates. We propose FlowDec—the first streaming neural codec to incorporate conditional flow matching into audio coding. Its core innovations include: (1) a non-adversarial autoregressive latent variable model; (2) streaming latent-space quantization coding; and (3) a lightweight stochastic post-filter requiring only six DNN inferences—reducing computational cost by 90% versus ScoreDec. Without knowledge distillation or fine-tuning, FlowDec achieves extreme compression from 24 kbit/s to 4 kbit/s. It attains significantly lower Fréchet Audio Distance (FAD) than the GAN-based baseline DAC, while delivering subjective quality on par with state-of-the-art GAN codecs. Moreover, FlowDec reconstructs harmonic structures in speech and music more naturally, demonstrating superior perceptual fidelity and spectral coherence.

Technology Category

Application Category

📝 Abstract

We propose FlowDec, a neural full-band audio codec for general audio sampled at 48 kHz that combines non-adversarial codec training with a stochastic postfilter based on a novel conditional flow matching method. Compared to the prior work ScoreDec which is based on score matching, we generalize from speech to general audio and move from 24 kbit/s to as low as 4 kbit/s, while improving output quality and reducing the required postfilter DNN evaluations from 60 to 6 without any fine-tuning or distillation techniques. We provide theoretical insights and geometric intuitions for our approach in comparison to ScoreDec as well as another recent work that uses flow matching, and conduct ablation studies on our proposed components. We show that FlowDec is a competitive alternative to the recent GAN-dominated stream of neural codecs, achieving FAD scores better than those of the established GAN-based codec DAC and listening test scores that are on par, and producing qualitatively more natural reconstructions for speech and harmonic structures in music.

Problem

Research questions and friction points this paper is trying to address.

Develops a neural audio codec for 48 kHz general audio

Reduces bitrate to 4 kbit/s while improving quality

Competes with GAN-based codecs in perceptual quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural full-band audio codec at 48 kHz

Stochastic postfilter with conditional flow matching

Reduced bitrate to 4 kbit/s, improved quality

🔎 Similar Papers

FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates