FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates

📅 2024-09-26
🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing audio fidelity and compression efficiency at ultra-low bitrates (e.g., 3 kbps) in general-purpose audio coding, this paper proposes FlowMAC—the first end-to-end Mel-spectrogram encoder based on Conditional Flow Matching (CFM). FlowMAC innovatively integrates CFM into audio coding by jointly optimizing Mel-spectrogram encoding, vector quantization, and streaming decoding. At the decoder, an ODE solver drives a continuous normalizing flow to reconstruct high-fidelity spectrograms. Compared to GAN- or DDPM-based approaches, FlowMAC achieves subjective audio quality comparable to 6 kbps codecs at only 3 kbps. Moreover, it offers scalable training, memory-efficient inference, and real-time CPU execution with adjustable quality–complexity trade-offs. FlowMAC significantly advances the fidelity and practicality of low-bitrate audio coding.

Technology Category

Application Category

📝 Abstract
This paper introduces FlowMAC, a novel neural audio codec for high-quality general audio compression at low bit rates based on conditional flow matching (CFM). FlowMAC jointly learns a mel spectrogram encoder, quantizer and decoder. At inference time the decoder integrates a continuous normalizing flow via an ODE solver to generate a high-quality mel spectrogram. This is the first time that a CFM-based approach is applied to general audio coding, enabling a scalable, simple and memory efficient training. Our subjective evaluations show that FlowMAC at 3 kbps achieves similar quality as state-of-the-art GAN-based and DDPM-based neural audio codecs at double the bit rate. Moreover, FlowMAC offers a tunable inference pipeline, which permits to trade off complexity and quality. This enables real-time coding on CPU, while maintaining high perceptual quality.
Problem

Research questions and friction points this paper is trying to address.

Develops neural audio codec for low bit rates
Applies conditional flow matching to audio coding
Enables real-time CPU coding with high quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional flow matching for low bit rates
Joint learning of encoder, quantizer, decoder
ODE solver for spectrogram generation
🔎 Similar Papers
No similar papers found.
Nicola Pia
Nicola Pia
Fraunhofer IIS
Maschine LearningAudio Coding
M
Martin Strauss
International Audio Laboratories Erlangen∗,2, Erlangen, Germany.
M
M. Multrus
Fraunhofer IIS1, Erlangen, Germany. International Audio Laboratories Erlangen∗,2, Erlangen, Germany.
Bernd Edler
Bernd Edler
International Audio Laboratories Erlangen∗,2, Erlangen, Germany.