Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses the limitations of traditional automatic speech recognition (ASR) evaluation, which relies heavily on word error rate (WER) and fails to capture the grammatical and semantic characteristics of transcription errors. To overcome this, the authors propose two novel metrics: Part-of-Speech Error Rate (POSER) and Embedding Error Rate (EmbER), which quantify ASR output quality from the perspectives of grammatical correctness and semantic fidelity, respectively. By integrating language model rescoring, part-of-speech tagging, and semantic distance computation based on word embeddings, they construct a multidimensional qualitative evaluation framework. Experimental results demonstrate that these new metrics effectively reveal the contribution of language models to improving linguistic quality in transcriptions, thereby compensating for WER’s insufficiency in linguistic analysis.

📝 Abstract

Evaluating automatic speech recognition (ASR) systems is a classical but difficult and still open problem, which often boils down to focusing only on the word error rate (WER). However, this metric suffers from many limitations and does not allow an in-depth analysis of automatic transcription errors. In this paper, we propose to study and understand the impact of rescoring using language models in ASR systems by means of several metrics often used in other natural language processing (NLP) tasks in addition to the WER. In particular, we introduce two measures related to morpho-syntactic and semantic aspects of transcribed words: 1) the POSER (Part-of-speech Error Rate), which should highlight the grammatical aspects, and 2) the EmbER (Embedding Error Rate), a measurement that modifies the WER by providing a weighting according to the semantic distance of the wrongly transcribed words. These metrics illustrate the linguistic contributions of the language models that are applied during a posterior rescoring step on transcription hypotheses.

Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition

Evaluation

Word Error Rate

Language Model Rescoring

Transcription Errors

Innovation

Methods, ideas, or system contributions that make the work stand out.

POSER

EmbER

language model rescoring