FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates

📅 2024-09-26

🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

To address the challenge of balancing audio fidelity and compression efficiency at ultra-low bitrates (e.g., 3 kbps) in general-purpose audio coding, this paper proposes FlowMAC—the first end-to-end Mel-spectrogram encoder based on Conditional Flow Matching (CFM). FlowMAC innovatively integrates CFM into audio coding by jointly optimizing Mel-spectrogram encoding, vector quantization, and streaming decoding. At the decoder, an ODE solver drives a continuous normalizing flow to reconstruct high-fidelity spectrograms. Compared to GAN- or DDPM-based approaches, FlowMAC achieves subjective audio quality comparable to 6 kbps codecs at only 3 kbps. Moreover, it offers scalable training, memory-efficient inference, and real-time CPU execution with adjustable quality–complexity trade-offs. FlowMAC significantly advances the fidelity and practicality of low-bitrate audio coding.

Technology Category

Application Category

📝 Abstract

This paper introduces FlowMAC, a novel neural audio codec for high-quality general audio compression at low bit rates based on conditional flow matching (CFM). FlowMAC jointly learns a mel spectrogram encoder, quantizer and decoder. At inference time the decoder integrates a continuous normalizing flow via an ODE solver to generate a high-quality mel spectrogram. This is the first time that a CFM-based approach is applied to general audio coding, enabling a scalable, simple and memory efficient training. Our subjective evaluations show that FlowMAC at 3 kbps achieves similar quality as state-of-the-art GAN-based and DDPM-based neural audio codecs at double the bit rate. Moreover, FlowMAC offers a tunable inference pipeline, which permits to trade off complexity and quality. This enables real-time coding on CPU, while maintaining high perceptual quality.

Problem

Research questions and friction points this paper is trying to address.

Develops neural audio codec for low bit rates

Applies conditional flow matching to audio coding

Enables real-time CPU coding with high quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional flow matching for low bit rates

Joint learning of encoder, quantizer, decoder

ODE solver for spectrogram generation

🔎 Similar Papers

MuCodec: Ultra Low-Bitrate Music Codec