๐ค AI Summary
This work addresses the challenge of conducting fine-grained, part-of-speech (PoS)-level error analysis in automatic speech recognition (ASR) for non-Latin script languages, where reliable word-level alignment is often unattainable. To overcome this limitation, the authors propose a language-agnostic automatic alignment framework that, for the first time, uniformly supports the three major writing systemsโAbugida, Alphabetic, and Abjad. By integrating a general-purpose sequence alignment algorithm with standard PoS taggers, the framework enables a scalable and reproducible pipeline for PoS-level ASR error analysis. This approach effectively removes linguistic barriers in ASR diagnostics for non-Latin scripts and demonstrates practical utility: in multilingual experiments, insights derived from the analysis were successfully fed back into ASR training, yielding significant reductions in word error rate (WER).
๐ Abstract
Automatic Speech Recognition (ASR) systems are commonly evaluated using aggregate metrics such as Word Error Rate (WER), which do not capture the linguistic structure of errors. Fine-grained analysis, such as Part-of-Speech (PoS)-wise error characterization, requires accurate alignment between ASR hypotheses and reference transcriptions. However, existing alignment tools are often unreliable for languages written in non-Latin scripts. In this work, we address this gap by proposing a robust, automated, language-agnostic alignment mechanism applicable across ASR architectures and across languages written in both Latin and non-Latin scripts. This enables consistent alignment of hypotheses, references, and evaluation sequences, forming the basis for downstream linguistic analysis. Building on this, we employ standard PoS taggers to perform scalable and reproducible PoS-wise error analysis. Notably, we perform alignment and downstream ASR error analysis across three major segmented writing systems, namely, Abugida (Tamil, Hindi, Kannada), Alphabetic (English, Russian, Greek), and Abjad (Arabic). We further demonstrate how such error information can be leveraged during ASR training to improve metrics such as WER.