Multi-Channel Differential ASR for Robust Wearer Speech Recognition on Smart Glasses

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Smart glasses suffer from severe degradation in wearer speech recognition (WSR) performance and error propagation in downstream NLP tasks due to interference from bystander speech (side-talk) in real-world environments. To address this, we propose a multi-channel differential automatic speech recognition (ASR) framework. Our method innovatively integrates beamforming, dynamic microphone selection, and a lightweight lateral speech detection model to generate robust differential signals at the front end. Moreover, we are the first to embed the differential mechanism into the end-to-end ASR joint optimization pipeline, enabling co-optimized interference suppression and speech recognition. Evaluated on both simulated and real-world datasets, our framework achieves up to a 18.0% relative reduction in word error rate (WER), significantly enhancing the stability and practicality of WSR under challenging acoustic conditions.

Technology Category

Application Category

📝 Abstract

With the growing adoption of wearable devices such as smart glasses for AI assistants, wearer speech recognition (WSR) is becoming increasingly critical to next-generation human-computer interfaces. However, in real environments, interference from side-talk speech remains a significant challenge to WSR and may cause accumulated errors for downstream tasks such as natural language processing. In this work, we introduce a novel multi-channel differential automatic speech recognition (ASR) method for robust WSR on smart glasses. The proposed system takes differential inputs from different frontends that complement each other to improve the robustness of WSR, including a beamformer, microphone selection, and a lightweight side-talk detection model. Evaluations on both simulated and real datasets demonstrate that the proposed system outperforms the traditional approach, achieving up to an 18.0% relative reduction in word error rate.

Problem

Research questions and friction points this paper is trying to address.

Robust wearer speech recognition on smart glasses

Addressing side-talk interference in speech recognition

Improving accuracy through multi-channel differential ASR

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-channel differential ASR system

Complementary frontends with beamformer and selection

Lightweight side-talk detection model

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Speech and Audio Systems Engineer

Qualcomm

$122,500.00 - $183,700.00

San Diego, NA

Research Scientist Intern, Multimodal Contextual AI (PhD)