Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the challenge of conducting fine-grained, part-of-speech (PoS)-level error analysis in automatic speech recognition (ASR) for non-Latin script languages, where reliable word-level alignment is often unattainable. To overcome this limitation, the authors propose a language-agnostic automatic alignment framework that, for the first time, uniformly supports the three major writing systems—Abugida, Alphabetic, and Abjad. By integrating a general-purpose sequence alignment algorithm with standard PoS taggers, the framework enables a scalable and reproducible pipeline for PoS-level ASR error analysis. This approach effectively removes linguistic barriers in ASR diagnostics for non-Latin scripts and demonstrates practical utility: in multilingual experiments, insights derived from the analysis were successfully fed back into ASR training, yielding significant reductions in word error rate (WER).

📝 Abstract

Automatic Speech Recognition (ASR) systems are commonly evaluated using aggregate metrics such as Word Error Rate (WER), which do not capture the linguistic structure of errors. Fine-grained analysis, such as Part-of-Speech (PoS)-wise error characterization, requires accurate alignment between ASR hypotheses and reference transcriptions. However, existing alignment tools are often unreliable for languages written in non-Latin scripts. In this work, we address this gap by proposing a robust, automated, language-agnostic alignment mechanism applicable across ASR architectures and across languages written in both Latin and non-Latin scripts. This enables consistent alignment of hypotheses, references, and evaluation sequences, forming the basis for downstream linguistic analysis. Building on this, we employ standard PoS taggers to perform scalable and reproducible PoS-wise error analysis. Notably, we perform alignment and downstream ASR error analysis across three major segmented writing systems, namely, Abugida (Tamil, Hindi, Kannada), Alphabetic (English, Russian, Greek), and Abjad (Arabic). We further demonstrate how such error information can be leveraged during ASR training to improve metrics such as WER.

Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition

non-Latin scripts

Part-of-Speech tagging

error analysis

alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

language-agnostic alignment

non-Latin scripts

PoS-based error analysis