LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses real-time multi-channel speech enhancement under unconstrained microphone arrays—where the number and geometric configuration of microphones are variable. We propose a lightweight attention-based beamforming network that employs a three-stage feature extraction and fusion framework, incorporating cross-channel attention to achieve microphone-invariant modeling for the first time at extremely low computational complexity. The method enables adaptive feature aggregation across arbitrary numbers and configurations of microphones. By integrating time-frequency domain beamforming with a compact neural architecture, it achieves real-time inference on edge devices. Experiments demonstrate that the model achieves superior speech quality and intelligibility over existing lightweight approaches, while reducing parameters (<1M) and computational cost (<0.5 GFLOPs). Crucially, it maintains robust performance across diverse array geometries, establishing a new paradigm for on-device speech enhancement in unconstrained array scenarios.

Technology Category

Application Category

📝 Abstract

Multichannel speech enhancement (SE) aims to restore clean speech from noisy measurements by leveraging spatiotemporal signal features. In ad-hoc array conditions, microphone invariance (MI) requires systems to handle different microphone numbers and array geometries. From a practical perspective, multichannel recordings inevitably increase the computational burden for edge-device applications, highlighting the necessity of lightweight and efficient deployments. In this work, we propose a lightweight attentive beamforming network (LABNet) to integrate MI in a low-complexity real-time SE system. We design a three-stage framework for efficient intra-channel modeling and inter-channel interaction. A cross-channel attention module is developed to aggregate features from each channel selectively. Experimental results demonstrate our LABNet achieves impressive performance with ultra-light resource overhead while maintaining the MI, indicating great potential for ad-hoc array processing.

Problem

Research questions and friction points this paper is trying to address.

Handles varying microphone numbers and geometries in ad-hoc arrays

Reduces computational burden for real-time edge-device speech enhancement

Achieves lightweight deployment with microphone invariance and efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight attentive beamforming network for SE

Three-stage framework for efficient channel modeling

Cross-channel attention module for selective feature aggregation

🔎 Similar Papers

No similar papers found.