A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation

📅 2024-09-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address the challenge of simultaneously achieving effective noise reduction, high speech intelligibility, and faithful preservation of spatial cues in hearing aids under intense noise conditions, this paper proposes LBCCN, a lightweight binaural speech enhancement model. LBCCN is the first to jointly incorporate low-frequency band-selective filtering with explicit modeling of inter-channel relative acoustic transfer functions (rATFs), implemented within a complex-valued convolutional neural network framework. It integrates band-adaptive filtering, complex spectral modeling, and spatial response constraint optimization. Evaluated on fixed-speaker scenarios, LBCCN achieves state-of-the-art (SOTA) denoising performance while reducing computational overhead by 67% and maintaining frame latency below 10 ms. Moreover, it significantly improves HRTF consistency (+12.3%) and azimuth perception accuracy (+9.8%), effectively breaking the conventional trade-off between noise reduction performance and spatial cue fidelity.

Technology Category

Application Category

📝 Abstract

Binaural speech enhancement (BSE) aims to jointly improve the speech quality and intelligibility of noisy signals received by hearing devices and preserve the spatial cues of the target for natural listening. Existing methods often suffer from the compromise between noise reduction (NR) capacity and spatial cues preservation (SCP) accuracy and a high computational demand in complex acoustic scenes. In this work, we present a learning-based lightweight binaural complex convolutional network (LBCCN), which excels in NR by filtering low-frequency bands and keeping the rest. Additionally, our approach explicitly incorporates the estimation of interchannel relative acoustic transfer function to ensure the spatial cues fidelity and speech clarity. Results show that the proposed LBCCN can achieve a comparable NR performance to state-of-the-art methods under fixed-speaker conditions, but with a much lower computational cost and a certain degree of SCP capability. The reproducible code and audio examples are available at https://github.com/jywanng/LBCCN.

Problem

Research questions and friction points this paper is trying to address.

Binaural Speech Enhancement

Noise Reduction

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

LBCCN

Binaural Speech Clarity Enhancement

Directional Information Preservation

🔎 Similar Papers

No similar papers found.