LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement

πŸ“… 2025-07-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses real-time multi-channel speech enhancement under unconstrained microphone arraysβ€”where the number and geometric configuration of microphones are variable. We propose a lightweight attention-based beamforming network that employs a three-stage feature extraction and fusion framework, incorporating cross-channel attention to achieve microphone-invariant modeling for the first time at extremely low computational complexity. The method enables adaptive feature aggregation across arbitrary numbers and configurations of microphones. By integrating time-frequency domain beamforming with a compact neural architecture, it achieves real-time inference on edge devices. Experiments demonstrate that the model achieves superior speech quality and intelligibility over existing lightweight approaches, while reducing parameters (<1M) and computational cost (<0.5 GFLOPs). Crucially, it maintains robust performance across diverse array geometries, establishing a new paradigm for on-device speech enhancement in unconstrained array scenarios.

Technology Category

Application Category

πŸ“ Abstract
Multichannel speech enhancement (SE) aims to restore clean speech from noisy measurements by leveraging spatiotemporal signal features. In ad-hoc array conditions, microphone invariance (MI) requires systems to handle different microphone numbers and array geometries. From a practical perspective, multichannel recordings inevitably increase the computational burden for edge-device applications, highlighting the necessity of lightweight and efficient deployments. In this work, we propose a lightweight attentive beamforming network (LABNet) to integrate MI in a low-complexity real-time SE system. We design a three-stage framework for efficient intra-channel modeling and inter-channel interaction. A cross-channel attention module is developed to aggregate features from each channel selectively. Experimental results demonstrate our LABNet achieves impressive performance with ultra-light resource overhead while maintaining the MI, indicating great potential for ad-hoc array processing.
Problem

Research questions and friction points this paper is trying to address.

Handles varying microphone numbers and geometries in ad-hoc arrays
Reduces computational burden for real-time edge-device speech enhancement
Achieves lightweight deployment with microphone invariance and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight attentive beamforming network for SE
Three-stage framework for efficient channel modeling
Cross-channel attention module for selective feature aggregation
πŸ”Ž Similar Papers
No similar papers found.
H
Haoyin Yan
NERC-SLIP, University of Science and Technology of China (USTC), Hefei, China
J
Jie Zhang
NERC-SLIP, University of Science and Technology of China (USTC), Hefei, China
C
Chengqian Jiang
NERC-SLIP, University of Science and Technology of China (USTC), Hefei, China
Shuang Zhang
Shuang Zhang
Chair Professor, University of Hong Kong;
metamaterialstopological photonicsmetasurfacesplasmonicsnonlinear optics