Data-independent Beamforming for End-to-end Multichannel Multi-speaker ASR

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation of automatic speech recognition (ASR) in multi-channel, multi-speaker scenarios caused by speech overlap, noise, and reverberation, this paper proposes a data-agnostic, training-free beamforming preprocessing method. The approach partitions the acoustic source angular space using spherical polar coordinates and applies directional signal enhancement within target regions while suppressing interference—thereby substantially reducing input redundancy and interference sensitivity for end-to-end ASR systems. Its key innovation lies in decoupling geometric priors (i.e., angular partitioning) from classical beamforming, enabling real-time, robust enhancement with zero training overhead. Experiments on the AMI corpus demonstrate that the method achieves up to an 11% absolute reduction in word error rate (WER) and a 27% relative improvement in speaker count estimation accuracy, significantly outperforming conventional single-channel and supervised beamforming baselines.

Technology Category

Application Category

📝 Abstract
Automatic speech recognition (ASR) in multichannel, multi-speaker scenarios remains challenging due to ambient noise, reverberation and overlapping speakers. In this paper, we propose a beamforming approach that processes specific angular sectors based on their spherical polar coordinates before applying an end-to-end multichannel, multi-speaker ASR system. This method is data-independent and training-free. We demonstrate that using a group of beamformed signals improves ASR performance compared to using the same number of raw microphone signals. Moreover, increasing the number of signals used for beamforming further enhances recognition accuracy, leading to a more efficient use of multichannel signals while reducing the overall input load for the ASR system. We conduct experiments on the AMI meeting corpus, where the proposed method reduces word error rate by up to 11% and improves speaker counting accuracy by up to 27% relative compared to a multichannel ASR baseline system that does not exploit beamforming.
Problem

Research questions and friction points this paper is trying to address.

Beamforming for multichannel multi-speaker ASR
Addressing noise reverberation overlapping speakers
Improving recognition accuracy speaker counting performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-independent beamforming using spherical polar coordinates
Training-free processing of specific angular sectors
Multiple beamformed signals enhance ASR performance
🔎 Similar Papers
No similar papers found.