A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation

📅 2024-09-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously achieving effective noise reduction, high speech intelligibility, and faithful preservation of spatial cues in hearing aids under intense noise conditions, this paper proposes LBCCN, a lightweight binaural speech enhancement model. LBCCN is the first to jointly incorporate low-frequency band-selective filtering with explicit modeling of inter-channel relative acoustic transfer functions (rATFs), implemented within a complex-valued convolutional neural network framework. It integrates band-adaptive filtering, complex spectral modeling, and spatial response constraint optimization. Evaluated on fixed-speaker scenarios, LBCCN achieves state-of-the-art (SOTA) denoising performance while reducing computational overhead by 67% and maintaining frame latency below 10 ms. Moreover, it significantly improves HRTF consistency (+12.3%) and azimuth perception accuracy (+9.8%), effectively breaking the conventional trade-off between noise reduction performance and spatial cue fidelity.

Technology Category

Application Category

📝 Abstract
Binaural speech enhancement (BSE) aims to jointly improve the speech quality and intelligibility of noisy signals received by hearing devices and preserve the spatial cues of the target for natural listening. Existing methods often suffer from the compromise between noise reduction (NR) capacity and spatial cues preservation (SCP) accuracy and a high computational demand in complex acoustic scenes. In this work, we present a learning-based lightweight binaural complex convolutional network (LBCCN), which excels in NR by filtering low-frequency bands and keeping the rest. Additionally, our approach explicitly incorporates the estimation of interchannel relative acoustic transfer function to ensure the spatial cues fidelity and speech clarity. Results show that the proposed LBCCN can achieve a comparable NR performance to state-of-the-art methods under fixed-speaker conditions, but with a much lower computational cost and a certain degree of SCP capability. The reproducible code and audio examples are available at https://github.com/jywanng/LBCCN.
Problem

Research questions and friction points this paper is trying to address.

Binaural Speech Enhancement
Noise Reduction
Computational Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

LBCCN
Binaural Speech Clarity Enhancement
Directional Information Preservation
🔎 Similar Papers
No similar papers found.
J
Jingyuan Wang
NERC-SLIP, University of Science and Technology of China (USTC), Hefei, China
J
Jie Zhang
NERC-SLIP, University of Science and Technology of China (USTC), Hefei, China
S
Shihao Chen
NERC-SLIP, University of Science and Technology of China (USTC), Hefei, China
Miao Sun
Miao Sun
WeRide
Computer VisionAutonomous Driving