Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications

📅 2025-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the significant degradation in automatic speech recognition (ASR) performance of voice assistants under strong loudspeaker interference (e.g., high-volume background music), this paper proposes a robust loudspeaker beamforming method. The approach formulates beamforming optimization using an auditory-perception-inspired distortion metric—introduced here for the first time in loudspeaker beamforming—to enable controllable trade-offs between ASR robustness and subjective audio quality. By modeling the acoustic field and performing real-time signal processing, the method actively establishes low-sound-pressure zones around voice-enabled devices. Simulation and real-world experiments demonstrate substantial ASR accuracy improvements across diverse noisy scenarios, with only marginal degradation in perceptual audio quality. This work establishes a new paradigm for voice-driven assistant (VDA) devices that jointly optimizes recognition performance and user experience.

Technology Category

Application Category

📝 Abstract
In this paper we propose a robust loudspeaker beamforming algorithm which is used to enhance the performance of voice driven applications in scenarios where the loudspeakers introduce the majority of the noise, e.g. when music is playing loudly. The loudspeaker beamformer modifies the loudspeaker playback signals to create a low-acoustic-energy region around the device that implements automatic speech recognition for a voice driven application (VDA). The algorithm utilises a distortion measure based on human auditory perception to limit the distortion perceived by human listeners. Simulations and real-world experiments show that the proposed loudspeaker beamformer improves the speech recognition performance in all tested scenarios. Moreover, the algorithm allows to further reduce the acoustic energy around the VDA device at the expense of reduced objective audio quality at the listener's location.
Problem

Research questions and friction points this paper is trying to address.

Speech Recognition
Noise Environment
User Experience
Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise Reduction
Speech Recognition Accuracy
Perceptual Audio Quality Preservation
🔎 Similar Papers
No similar papers found.
D
Dimme de Groot
Multimedia Computing Group, EEMCS, Delft University of Technology, The Netherlands
B
Baturalp Karslioglu
Multimedia Computing Group, EEMCS, Delft University of Technology, The Netherlands
Odette Scharenborg
Odette Scharenborg
Full Professor, Delft University of Technology, The Netherlands
speech processingautomatic speech recognitiondeep neural networksartificial intelligence
J
Jorge Martinez
Multimedia Computing Group, EEMCS, Delft University of Technology, The Netherlands