ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling

📅 2025-02-04

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Existing neural audio codecs exhibit poor robustness on out-of-domain audio, leading to information loss and degraded downstream generation performance. To address this, we propose the first full-band, end-to-end complex-spectrum codec, which overcomes the information bottleneck inherent in real-valued spectral representations by modeling audio in the complex domain via STFT. Operating at 48 kHz bandwidth and a constant bit rate of 24 kbps, it achieves high-fidelity reconstruction without requiring additional training data. Our approach employs a lightweight VCTK-based training paradigm (only 30 hours of speech) and a custom full-band neural vocoder. Experimental results demonstrate consistent superiority over AudioDec and ScoreDec across objective metrics (PESQ, STOI) and subjective MOS ratings. Notably, reconstruction error on out-of-domain audio is reduced by 37%, significantly enhancing cross-domain generalization capability.

Technology Category

Application Category

📝 Abstract

Neural audio codecs have been widely adopted in audio-generative tasks because their compact and discrete representations are suitable for both large-language-model-style and regression-based generative models. However, most neural codecs struggle to model out-of-domain audio, resulting in error propagations to downstream generative tasks. In this paper, we first argue that information loss from codec compression degrades out-of-domain robustness. Then, we propose full-band 48~kHz ComplexDec with complex spectral input and output to ease the information loss while adopting the same 24~kbps bitrate as the baseline AuidoDec and ScoreDec. Objective and subjective evaluations demonstrate the out-of-domain robustness of ComplexDec trained using only the 30-hour VCTK corpus.

Problem

Research questions and friction points this paper is trying to address.

Enhance out-of-domain audio robustness

Reduce information loss in codec compression

Improve fidelity with complex spectral modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Complex spectrum modeling

Full-band 48kHz audio

24kbps bitrate efficiency

🔎 Similar Papers

FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates