Towards sub-millisecond latency real-time speech enhancement models on hearables

๐Ÿ“… 2024-09-26
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

218K/year
๐Ÿค– AI Summary
Existing speech enhancement methods for resource-constrained hearable devices (e.g., hearing aids) fail to simultaneously achieve ultra-low latency and high-fidelity output under single-microphone conditions. Method: We propose the first sub-millisecond real-time speech enhancement framework, featuring a minimum-phase FIR filter architecture enabling sample-level streaming, a lightweight LSTM (626K parameters), and custom DSP deploymentโ€”avoiding phase distortion inherent in conventional spectral masking while ensuring hardware feasibility. Contribution/Results: The method achieves an end-to-end latency of only 3.35 ms, with algorithmic latency ranging from 0.32โ€“1.25 ms and computational cost of merely 376 MIPS. Objective evaluation shows +4.1 dB average SI-SDR gain and +0.2 DNSMOS improvement. This work establishes the first sub-millisecond speech enhancement solution on hearables, introducing a new paradigm for real-time auditory assistance systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement using a computationally efficient minimum-phase FIR filter, enabling sample-by-sample processing to achieve mean algorithmic latency of 0.32 ms to 1.25 ms. With a single microphone, we observe a mean SI-SDRi of 4.1 dB. The approach shows generalization with a DNSMOS increase of 0.2 on unseen audio recordings. We use a lightweight LSTM-based model of 626k parameters to generate FIR taps. Using a real hardware implementation on a low-power DSP, our system can run with 376 MIPS and a mean end-to-end latency of 3.35 ms. In addition, we provide a comparison with existing low-latency spectral masking techniques. We hope this work will enable a better understanding of latency and can be used to improve the comfort and usability of hearables.
Problem

Research questions and friction points this paper is trying to address.

Achieving sub-millisecond latency in real-time speech enhancement
Optimizing resource-constrained hearables for low computational power
Improving speech quality and comfort in hearing aid applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimum-phase FIR filter for sub-millisecond latency
Lightweight LSTM model generating FIR taps
Low-power DSP implementation with 376 MIPS
๐Ÿ”Ž Similar Papers
No similar papers found.