An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Conventional automatic speech recognition (ASR) evaluation using word error rate (WER) fails to capture the practical impact of ASR errors on downstream large language model (LLM)-driven tasks. Method: We propose a task-oriented ASR evaluation framework that (1) systematically classifies ASR error types and analyzes their contextual reparability within LLM prompts; (2) defines a multidimensional metric integrating semantic severity of errors, LLM-based correction success rate, and end-task completion accuracy; and (3) validates the framework empirically on representative speech-to-LLM pipelines—including voice command execution and meeting summary generation. Results: Our framework significantly outperforms WER in reflecting ASR effectiveness in real-world LLM applications. It provides an interpretable, quantifiable assessment grounded in downstream task performance, enabling principled, task-aware ASR model development and optimization.

Technology Category

Application Category

📝 Abstract

Automatic Speech Recognition (ASR) plays a crucial role in human-machine interaction and serves as an interface for a wide range of applications. Traditionally, ASR performance has been evaluated using Word Error Rate (WER), a metric that quantifies the number of insertions, deletions, and substitutions in the generated transcriptions. However, with the increasing adoption of large and powerful Large Language Models (LLMs) as the core processing component in various applications, the significance of different types of ASR errors in downstream tasks warrants further exploration. In this work, we analyze the capabilities of LLMs to correct errors introduced by ASRs and propose a new measure to evaluate ASR performance for LLM-powered applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluating ASR performance in LLM-powered applications

Assessing impact of ASR errors on downstream LLM tasks

Proposing new ASR metric for LLM-based systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing LLMs' capability to correct ASR errors

Proposing new ASR performance measure for LLMs

Exploring ASR error impact on downstream tasks

🔎 Similar Papers

No similar papers found.