Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications

📅 2025-01-14

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

To address the significant degradation in automatic speech recognition (ASR) performance of voice assistants under strong loudspeaker interference (e.g., high-volume background music), this paper proposes a robust loudspeaker beamforming method. The approach formulates beamforming optimization using an auditory-perception-inspired distortion metric—introduced here for the first time in loudspeaker beamforming—to enable controllable trade-offs between ASR robustness and subjective audio quality. By modeling the acoustic field and performing real-time signal processing, the method actively establishes low-sound-pressure zones around voice-enabled devices. Simulation and real-world experiments demonstrate substantial ASR accuracy improvements across diverse noisy scenarios, with only marginal degradation in perceptual audio quality. This work establishes a new paradigm for voice-driven assistant (VDA) devices that jointly optimizes recognition performance and user experience.

Technology Category

Application Category

📝 Abstract

In this paper we propose a robust loudspeaker beamforming algorithm which is used to enhance the performance of voice driven applications in scenarios where the loudspeakers introduce the majority of the noise, e.g. when music is playing loudly. The loudspeaker beamformer modifies the loudspeaker playback signals to create a low-acoustic-energy region around the device that implements automatic speech recognition for a voice driven application (VDA). The algorithm utilises a distortion measure based on human auditory perception to limit the distortion perceived by human listeners. Simulations and real-world experiments show that the proposed loudspeaker beamformer improves the speech recognition performance in all tested scenarios. Moreover, the algorithm allows to further reduce the acoustic energy around the VDA device at the expense of reduced objective audio quality at the listener's location.

Problem

Research questions and friction points this paper is trying to address.

Speech Recognition

Noise Environment

User Experience

Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise Reduction

Speech Recognition Accuracy

Perceptual Audio Quality Preservation

🔎 Similar Papers

No similar papers found.