A lightweight and robust method for blind wideband-to-fullband extension of speech

📅 2024-12-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited wideband coding quality in low-resource speech communication (e.g., voice calls over weak networks), this paper proposes a lightweight, blind wideband-to-fullband speech bandwidth extension method. The approach innovatively incorporates classical speech coding principles into a compact neural network architecture—requiring only 370K parameters and 140 MFLOPS—while supporting 10-ms frames and ultra-low latency of 0.27 ms. It employs joint time-frequency modeling, fusing MFCCs and narrowband spectral features to learn bandwidth mapping end-to-end, without relying on consonant detection or spectral guidance. Evaluated on Opus SILK 1.5, the method significantly improves speech quality at 6–12 kb/s. At 9 kb/s, it achieves an EVSS MOS score of 4.2+, matching the perceptual quality of 3GPP EVS at 9.6 kb/s and Opus 1.4 at 18 kb/s—demonstrating high-fidelity blind bandwidth extension under extremely low bitrates.

Technology Category

Application Category

📝 Abstract
Reducing the bandwidth of speech is common practice in resource constrained environments like low-bandwidth speech transmission or low-complexity vocoding. We propose a lightweight and robust method for extending the bandwidth of wideband speech signals that is inspired by classical methods developed in the speech coding context. The resulting model has just $sim 370$~K parameters and a complexity of ~140 MFLOPS (or ~70 MMACS). With a frame size of 10 ms and a lookahead of just 0.27 ms the model is well-suited for common wideband speech codecs. We evaluate the model's robustness by pairing it with the Opus SILK speech codec (1.5 release) and verify in a P.808 DCR listening test that it significantly improves quality from 6 to 12 kb/s. We also demonstrate that Opus 1.5 together with the proposed bandwidth extension at 9 kb/s meets the quality of 3GPP EVS at 9.6 kb/s and that of Opus 1.4 at 18 kb/s showing that the blind bandwidth extension can meet the quality of classical guided bandwidth extensions.
Problem

Research questions and friction points this paper is trying to address.

Resource-constrained environment
Wideband speech coding
Speech clarity and quality enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Narrowband to Wideband Conversion
Low-complexity Model
Speech Quality Enhancement
🔎 Similar Papers
No similar papers found.