LASER: An LLM-based ASR Scoring and Evaluation Rubric

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Conventional ASR evaluation metrics (e.g., WER) over-penalize morphosyntactic variations that preserve semantic meaning—particularly problematic for morphologically rich, word-order-flexible Indian languages. Method: We propose LASER, the first multilingual, semantics-aware ASR scoring framework leveraging large language model (LLM) in-context learning. LASER combines zero-shot scoring using Gemini 2.5 Pro with a fine-tuned Llama 3 model trained on word-pair-level data to classify error penalty types; Hindi-based prompts generalize across Indian languages, enabling lightweight deployment. Contribution/Results: LASER achieves 94% correlation with human judgments and 89% accuracy in error-type classification. This work pioneers the systematic integration of LLM in-context learning into ASR semantic evaluation, significantly enhancing fairness and fine-grained analytical capability—especially for low-resource languages.

Technology Category

Application Category

📝 Abstract

Standard ASR evaluation metrics like Word Error Rate (WER) tend to unfairly penalize morphological and syntactic nuances that do not significantly alter sentence semantics. We introduce an LLM-based scoring rubric LASER that leverages state-of-the-art LLMs' in-context learning abilities to learn from prompts with detailed examples. Hindi LASER scores using Gemini 2.5 Pro achieved a very high correlation score of 94% with human annotations. Hindi examples in the prompt were also effective in analyzing errors in other Indian languages such as Marathi, Kannada and Malayalam. We also demonstrate how a smaller LLM like Llama 3 can be finetuned on word-pair examples derived from reference and ASR predictions to predict what kind of penalty should be applied with close to 89% accuracy.

Problem

Research questions and friction points this paper is trying to address.

Evaluating ASR systems beyond standard WER metrics

Addressing unfair penalization of linguistic nuances in ASR

Developing LLM-based scoring for multiple Indian languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based ASR scoring using in-context learning

Cross-language error analysis with Hindi examples

Fine-tuning smaller LLMs for penalty prediction accuracy

🔎 Similar Papers

No similar papers found.