Beyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR Evaluation

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
Current automatic speech recognition (ASR) evaluation relies on a single transcription norm as the “ground truth,” implicitly assuming linguistic conformity and systematically marginalizing speakers with atypical speech, such as those with aphasia, thereby perpetuating epistemic injustice. This work critiques this “referential monism” and introduces the philosophical notion of the “explanatory gap” to propose a novel metric—Epistemic Injustice Distance (EID)—while advocating for reporting WER-Range instead of a single word error rate (WER). Empirical analysis on the AphasiaBank dataset reveals substantial WER variability across different transcription conventions, confirming the bias of conventional ASR evaluation against individuals with aphasia. The study demonstrates that WER-Range offers a more equitable and comprehensive assessment of ASR performance for diverse speaker populations.
📝 Abstract
Automatic speech recognition (ASR) evaluation compares system output to ground truth transcripts, with Word Error Rate (WER) quantifying the distance between them. But ground truth transcripts are not discovered - they are produced by human annotators following conventions that encode normative assumptions about which speech features matter. Different conventions (verbatim, non-verbatim, legal) produce different transcripts of identical speech and judge the same ASR output differently. This paper argues that reference monism - enforcing a single transcription convention as ground truth - commits epistemic injustice. Speakers with aphasia, whose speech includes clinically meaningful disfluencies, are systematically disadvantaged when evaluated against "clean" references that treat those disfluencies as errors. The harm is not merely differential performance, but that evaluative infrastructure lacks interpretive resources to recognize their contributions as legitimate. We develop a philosophical framework introducing the hermeneutical gap, formalize Epistemic Injustice Distance (EID) to measure reference monism's cost, and demonstrate empirically using AphasiaBank that WER varies depending on which convention defines ground truth. We propose WER-Range: reporting performance across legitimate conventions rather than assuming a single correct answer.
Problem

Research questions and friction points this paper is trying to address.

reference monism
epistemic injustice
automatic speech recognition
transcription conventions
Word Error Rate
Innovation

Methods, ideas, or system contributions that make the work stand out.

Epistemic Injustice Distance
WER-Range
reference monism
hermeneutical gap
ASR evaluation
🔎 Similar Papers
No similar papers found.