Multi-Channel Differential ASR for Robust Wearer Speech Recognition on Smart Glasses

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Smart glasses suffer from severe degradation in wearer speech recognition (WSR) performance and error propagation in downstream NLP tasks due to interference from bystander speech (side-talk) in real-world environments. To address this, we propose a multi-channel differential automatic speech recognition (ASR) framework. Our method innovatively integrates beamforming, dynamic microphone selection, and a lightweight lateral speech detection model to generate robust differential signals at the front end. Moreover, we are the first to embed the differential mechanism into the end-to-end ASR joint optimization pipeline, enabling co-optimized interference suppression and speech recognition. Evaluated on both simulated and real-world datasets, our framework achieves up to a 18.0% relative reduction in word error rate (WER), significantly enhancing the stability and practicality of WSR under challenging acoustic conditions.

Technology Category

Application Category

📝 Abstract
With the growing adoption of wearable devices such as smart glasses for AI assistants, wearer speech recognition (WSR) is becoming increasingly critical to next-generation human-computer interfaces. However, in real environments, interference from side-talk speech remains a significant challenge to WSR and may cause accumulated errors for downstream tasks such as natural language processing. In this work, we introduce a novel multi-channel differential automatic speech recognition (ASR) method for robust WSR on smart glasses. The proposed system takes differential inputs from different frontends that complement each other to improve the robustness of WSR, including a beamformer, microphone selection, and a lightweight side-talk detection model. Evaluations on both simulated and real datasets demonstrate that the proposed system outperforms the traditional approach, achieving up to an 18.0% relative reduction in word error rate.
Problem

Research questions and friction points this paper is trying to address.

Robust wearer speech recognition on smart glasses
Addressing side-talk interference in speech recognition
Improving accuracy through multi-channel differential ASR
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-channel differential ASR system
Complementary frontends with beamformer and selection
Lightweight side-talk detection model
🔎 Similar Papers
No similar papers found.
Y
Yufeng Yang
The Ohio State University, USA; Meta, USA
Y
Yiteng Huang
Meta, USA
Y
Yong Xu
Meta, USA
Li Wan
Li Wan
Amazon AWS
Machine LearningNeural Networks
Suwon Shon
Suwon Shon
Meta, USA
Y
Yang Liu
Meta, USA
Y
Yifeng Fan
Meta, USA
Zhaojun Yang
Zhaojun Yang
Research Scientist, Facebook
Affective computingmachine learningmultimodal modelingspoken dialog system
O
Olivier Siohan
Meta, USA
Y
Yue Liu
Meta, USA
M
Ming Sun
Meta, USA
Florian Metze
Florian Metze
Carnegie Mellon University; Meta AI
speech recognitionvideo understanding