Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts

๐Ÿ“… 2026-05-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of conducting fine-grained, part-of-speech (PoS)-level error analysis in automatic speech recognition (ASR) for non-Latin script languages, where reliable word-level alignment is often unattainable. To overcome this limitation, the authors propose a language-agnostic automatic alignment framework that, for the first time, uniformly supports the three major writing systemsโ€”Abugida, Alphabetic, and Abjad. By integrating a general-purpose sequence alignment algorithm with standard PoS taggers, the framework enables a scalable and reproducible pipeline for PoS-level ASR error analysis. This approach effectively removes linguistic barriers in ASR diagnostics for non-Latin scripts and demonstrates practical utility: in multilingual experiments, insights derived from the analysis were successfully fed back into ASR training, yielding significant reductions in word error rate (WER).
๐Ÿ“ Abstract
Automatic Speech Recognition (ASR) systems are commonly evaluated using aggregate metrics such as Word Error Rate (WER), which do not capture the linguistic structure of errors. Fine-grained analysis, such as Part-of-Speech (PoS)-wise error characterization, requires accurate alignment between ASR hypotheses and reference transcriptions. However, existing alignment tools are often unreliable for languages written in non-Latin scripts. In this work, we address this gap by proposing a robust, automated, language-agnostic alignment mechanism applicable across ASR architectures and across languages written in both Latin and non-Latin scripts. This enables consistent alignment of hypotheses, references, and evaluation sequences, forming the basis for downstream linguistic analysis. Building on this, we employ standard PoS taggers to perform scalable and reproducible PoS-wise error analysis. Notably, we perform alignment and downstream ASR error analysis across three major segmented writing systems, namely, Abugida (Tamil, Hindi, Kannada), Alphabetic (English, Russian, Greek), and Abjad (Arabic). We further demonstrate how such error information can be leveraged during ASR training to improve metrics such as WER.
Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition
non-Latin scripts
Part-of-Speech tagging
error analysis
alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

language-agnostic alignment
non-Latin scripts
PoS-based error analysis
ASR error characterization
automatic speech recognition
๐Ÿ”Ž Similar Papers
No similar papers found.