PhoenixCodec: Taming Neural Speech Coding for Extreme Low-Resource Scenarios

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of balancing efficiency and quality in neural speech coding under extreme low-resource constraints (<700 MFLOPs, <30 ms latency, dual-bitrate support at 1/6 kbps), this paper proposes an efficient neural speech codec framework. Methodologically, it introduces: (1) an optimized asymmetric frequency-time encoder-decoder architecture that alleviates resource dispersion bottlenecks in conventional decoders; (2) a cyclic calibration and refinement (CCR) training strategy to enhance waveform reconstruction fidelity; and (3) noise-invariant fine-tuning to improve robustness in challenging acoustic conditions (e.g., real-world noise and reverberation). Evaluated on the LRAC 2025 Challenge, the framework achieves third place overall and attains state-of-the-art intelligibility performance—both on clean speech and under realistic noisy/reverberant conditions—at 1 kbps.

Technology Category

Application Category

📝 Abstract
This paper presents PhoenixCodec, a comprehensive neural speech coding and decoding framework designed for extremely low-resource conditions. The proposed system integrates an optimized asymmetric frequency-time architecture, a Cyclical Calibration and Refinement (CCR) training strategy, and a noise-invariant fine-tuning procedure. Under stringent constraints - computation below 700 MFLOPs, latency less than 30 ms, and dual-rate support at 1 kbps and 6 kbps - existing methods face a trade-off between efficiency and quality. PhoenixCodec addresses these challenges by alleviating the resource scattering of conventional decoders, employing CCR to escape local optima, and enhancing robustness through noisy-sample fine-tuning. In the LRAC 2025 Challenge Track 1, the proposed system ranked third overall and demonstrated the best performance at 1 kbps in both real-world noise and reverberation and intelligibility in clean tests, confirming its effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Optimizing neural speech coding for extreme low-resource conditions
Balancing efficiency and quality under strict computational constraints
Enhancing robustness in noisy environments and intelligibility tests
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized asymmetric frequency-time architecture integration
Cyclical Calibration and Refinement training strategy implementation
Noise-invariant fine-tuning for enhanced robustness
🔎 Similar Papers
No similar papers found.
Z
Zixiang Wan
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Peking University, Shenzhen, China
H
Haoran Zhao
Audio Innovation Technology Department, Anker Inc, Beijing, China
G
Guochang Zhang
Audio Innovation Technology Department, Anker Inc, Beijing, China
R
Runqiang Han
Audio Innovation Technology Department, Anker Inc, Beijing, China
J
Jianqiang Wei
Audio Innovation Technology Department, Anker Inc, Beijing, China
Yuexian Zou
Yuexian Zou
Peking University Shenzhen Graduate School
Machine LearningSpeech ProcessingImage Processing