LASER: An LLM-based ASR Scoring and Evaluation Rubric

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional ASR evaluation metrics (e.g., WER) over-penalize morphosyntactic variations that preserve semantic meaning—particularly problematic for morphologically rich, word-order-flexible Indian languages. Method: We propose LASER, the first multilingual, semantics-aware ASR scoring framework leveraging large language model (LLM) in-context learning. LASER combines zero-shot scoring using Gemini 2.5 Pro with a fine-tuned Llama 3 model trained on word-pair-level data to classify error penalty types; Hindi-based prompts generalize across Indian languages, enabling lightweight deployment. Contribution/Results: LASER achieves 94% correlation with human judgments and 89% accuracy in error-type classification. This work pioneers the systematic integration of LLM in-context learning into ASR semantic evaluation, significantly enhancing fairness and fine-grained analytical capability—especially for low-resource languages.

Technology Category

Application Category

📝 Abstract
Standard ASR evaluation metrics like Word Error Rate (WER) tend to unfairly penalize morphological and syntactic nuances that do not significantly alter sentence semantics. We introduce an LLM-based scoring rubric LASER that leverages state-of-the-art LLMs' in-context learning abilities to learn from prompts with detailed examples. Hindi LASER scores using Gemini 2.5 Pro achieved a very high correlation score of 94% with human annotations. Hindi examples in the prompt were also effective in analyzing errors in other Indian languages such as Marathi, Kannada and Malayalam. We also demonstrate how a smaller LLM like Llama 3 can be finetuned on word-pair examples derived from reference and ASR predictions to predict what kind of penalty should be applied with close to 89% accuracy.
Problem

Research questions and friction points this paper is trying to address.

Evaluating ASR systems beyond standard WER metrics
Addressing unfair penalization of linguistic nuances in ASR
Developing LLM-based scoring for multiple Indian languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based ASR scoring using in-context learning
Cross-language error analysis with Hindi examples
Fine-tuning smaller LLMs for penalty prediction accuracy
🔎 Similar Papers
No similar papers found.