RAS: a Reliability Oriented Metric for Automatic Speech Recognition

πŸ“… 2026-04-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

194K/year
πŸ€– AI Summary
This work addresses the limitation of conventional automatic speech recognition (ASR) systems, which often produce highly confident yet erroneous transcriptions under noisy or ambiguous conditionsβ€”a failure mode not captured by standard word error rate metrics. To enhance reliability, the authors propose an ASR framework that supports active abstention, enabling the model to withhold output when uncertain. They introduce RAS, the first human-preference-based evaluation metric that jointly quantifies transcription informativeness and error-avoidance capability. The abstention-aware model is trained via a combination of supervised bootstrapping and reinforcement learning, with policy optimization guided by a calibrated RAS score. Experimental results demonstrate that the proposed approach significantly improves transcription reliability while maintaining high accuracy.

Technology Category

Application Category

πŸ“ Abstract
Automatic speech recognition systems often produce confident yet incorrect transcriptions under noisy or ambiguous conditions, which can be misleading for both users and downstream applications. Standard evaluation based on Word Error Rate focuses solely on accuracy and fails to capture transcription reliability. We introduce an abstention-aware transcription framework that enables ASR models to explicitly abstain from uncertain segments. To evaluate reliability under abstention, we propose RAS, a reliability-oriented metric that balances transcription informativeness and error aversion, with its trade-off parameter calibrated by human preference. We then train an abstention-aware ASR model through supervised bootstrapping followed by reinforcement learning. Our experiments demonstrate substantial improvements in transcription reliability while maintaining competitive accuracy.
Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition
Reliability
Abstention
Word Error Rate
Uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

abstention-aware ASR
reliability metric
RAS
reinforcement learning
transcription uncertainty
πŸ”Ž Similar Papers
No similar papers found.
W
Wenbin Huang
X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University; Ministry of Education Key Laboratory of Artificial Intelligence; Jiangsu Key Laboratory of Language Computing, Shanghai, China
Y
Yuhang Qiu
X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University; Ministry of Education Key Laboratory of Artificial Intelligence; Jiangsu Key Laboratory of Language Computing, Shanghai, China
Bohan Li
Bohan Li
Shanghai Jiao Tong University
3D Visionstereo matchingdisparity regression
Yiwei Guo
Yiwei Guo
Shanghai Jiao Tong University
Speech and Audio ProcessingSpeech SynthesisText-to-speechArtificial Intelligence
Jing Peng
Jing Peng
Shanghai Jiao Tong University
Automatic Speech RecognitionSpeech Large Language Model
Hankun Wang
Hankun Wang
Shanghai Jiao Tong University
Speech Synthesis
Xie Chen
Xie Chen
Shanghai Jiao Tong University <- Microsoft <- Cambridge University
Machine LearningSpeech RecognitionSpeech SynthesisSpeech&Audio Processing
K
Kai Yu
X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University; Ministry of Education Key Laboratory of Artificial Intelligence; Jiangsu Key Laboratory of Language Computing, Shanghai, China