PhoenixCodec: Taming Neural Speech Coding for Extreme Low-Resource Scenarios

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Addressing the challenge of balancing efficiency and quality in neural speech coding under extreme low-resource constraints (<700 MFLOPs, <30 ms latency, dual-bitrate support at 1/6 kbps), this paper proposes an efficient neural speech codec framework. Methodologically, it introduces: (1) an optimized asymmetric frequency-time encoder-decoder architecture that alleviates resource dispersion bottlenecks in conventional decoders; (2) a cyclic calibration and refinement (CCR) training strategy to enhance waveform reconstruction fidelity; and (3) noise-invariant fine-tuning to improve robustness in challenging acoustic conditions (e.g., real-world noise and reverberation). Evaluated on the LRAC 2025 Challenge, the framework achieves third place overall and attains state-of-the-art intelligibility performance—both on clean speech and under realistic noisy/reverberant conditions—at 1 kbps.

Technology Category

Application Category

📝 Abstract

This paper presents PhoenixCodec, a comprehensive neural speech coding and decoding framework designed for extremely low-resource conditions. The proposed system integrates an optimized asymmetric frequency-time architecture, a Cyclical Calibration and Refinement (CCR) training strategy, and a noise-invariant fine-tuning procedure. Under stringent constraints - computation below 700 MFLOPs, latency less than 30 ms, and dual-rate support at 1 kbps and 6 kbps - existing methods face a trade-off between efficiency and quality. PhoenixCodec addresses these challenges by alleviating the resource scattering of conventional decoders, employing CCR to escape local optima, and enhancing robustness through noisy-sample fine-tuning. In the LRAC 2025 Challenge Track 1, the proposed system ranked third overall and demonstrated the best performance at 1 kbps in both real-world noise and reverberation and intelligibility in clean tests, confirming its effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Optimizing neural speech coding for extreme low-resource conditions

Balancing efficiency and quality under strict computational constraints

Enhancing robustness in noisy environments and intelligibility tests

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized asymmetric frequency-time architecture integration

Cyclical Calibration and Refinement training strategy implementation

Noise-invariant fine-tuning for enhanced robustness

🔎 Similar Papers

No similar papers found.