An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional automatic speech recognition (ASR) evaluation using word error rate (WER) fails to capture the practical impact of ASR errors on downstream large language model (LLM)-driven tasks. Method: We propose a task-oriented ASR evaluation framework that (1) systematically classifies ASR error types and analyzes their contextual reparability within LLM prompts; (2) defines a multidimensional metric integrating semantic severity of errors, LLM-based correction success rate, and end-task completion accuracy; and (3) validates the framework empirically on representative speech-to-LLM pipelines—including voice command execution and meeting summary generation. Results: Our framework significantly outperforms WER in reflecting ASR effectiveness in real-world LLM applications. It provides an interpretable, quantifiable assessment grounded in downstream task performance, enabling principled, task-aware ASR model development and optimization.

Technology Category

Application Category

📝 Abstract
Automatic Speech Recognition (ASR) plays a crucial role in human-machine interaction and serves as an interface for a wide range of applications. Traditionally, ASR performance has been evaluated using Word Error Rate (WER), a metric that quantifies the number of insertions, deletions, and substitutions in the generated transcriptions. However, with the increasing adoption of large and powerful Large Language Models (LLMs) as the core processing component in various applications, the significance of different types of ASR errors in downstream tasks warrants further exploration. In this work, we analyze the capabilities of LLMs to correct errors introduced by ASRs and propose a new measure to evaluate ASR performance for LLM-powered applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluating ASR performance in LLM-powered applications
Assessing impact of ASR errors on downstream LLM tasks
Proposing new ASR metric for LLM-based systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing LLMs' capability to correct ASR errors
Proposing new ASR performance measure for LLMs
Exploring ASR error impact on downstream tasks
🔎 Similar Papers
No similar papers found.
Sujith Pulikodan
Sujith Pulikodan
ARTPARK@IISc
Statistical EstimationMachine LearningSpeech Processing
S
Sahapthan K
Department of Mathematics, Indian Institute of Science, India
Prasanta Kumar Ghosh
Prasanta Kumar Ghosh
Associate Professor, Indian Institute of Science (IISc), Bangalore
Human-centered signal and information processing
V
Visruth Sanka
AI & Robotics Technology Park(ARTPARK), I-Hub @ IISc, India
N
Nihar Desai
AI & Robotics Technology Park(ARTPARK), I-Hub @ IISc, India