WER We Stand: Benchmarking Urdu ASR Models

๐Ÿ“… 2024-09-17
๐Ÿ›๏ธ International Conference on Computational Linguistics
๐Ÿ“ˆ Citations: 4
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenges of evaluating automatic speech recognition (ASR) for low-resource Urdu. We systematically benchmark three state-of-the-art modelsโ€”Whisper, MMS, and Seamless-M4Tโ€”on both read and conversational speech. To enable rigorous evaluation, we introduce the first dedicated Urdu conversational speech benchmark. Performance is assessed using word error rate (WER) and fine-grained error analysis (substitutions, insertions, deletions), exposing limitations of standard metrics in handling dialectal vocabulary, coarticulation, and non-normalized text. Key contributions are: (1) empirical validation that Seamless-large excels on read speech while Whisper-large achieves superior performance on conversational speech; (2) the first demonstration that a robust Urdu text normalization pipeline is critical for enhancing ASR robustness; and (3) methodological insights and open data infrastructure to advance principled ASR evaluation for low-resource languages.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper presents a comprehensive evaluation of Urdu Automatic Speech Recognition (ASR) models. We analyze the performance of three ASR model families: Whisper, MMS, and Seamless-M4T using Word Error Rate (WER), along with a detailed examination of the most frequent wrong words and error types including insertions, deletions, and substitutions. Our analysis is conducted using two types of datasets, read speech and conversational speech. Notably, we present the first conversational speech dataset designed for benchmarking Urdu ASR models. We find that seamless-large outperforms other ASR models on the read speech dataset, while whisper-large performs best on the conversational speech dataset. Furthermore, this evaluation highlights the complexities of assessing ASR models for low-resource languages like Urdu using quantitative metrics alone and emphasizes the need for a robust Urdu text normalization system. Our findings contribute valuable insights for developing robust ASR systems for low-resource languages like Urdu.
Problem

Research questions and friction points this paper is trying to address.

Evaluating performance of Urdu ASR models using WER metrics
Comparing ASR models on read and conversational speech datasets
Addressing challenges in assessing low-resource language ASR systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates Whisper, MMS, Seamless-M4T using WER
Introduces first Urdu conversational speech dataset
Highlights need for Urdu text normalization
๐Ÿ”Ž Similar Papers
No similar papers found.