Nw\=ach\=a Mun\=a: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR

๐Ÿ“… 2026-03-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the marginalization of Newari (Nepal Bhasa) in automatic speech recognition due to the extreme scarcity of annotated speech data. The authors construct the first manually transcribed Newari speech corpus in Devanagari script, totaling 5.39 hours, and propose leveraging geographically and linguistically proximate Nepali for cross-lingual transfer instead of large-scale multilingual pretraining. Using a Conformer architecture augmented with data augmentation and fine-tuning strategies, they achieve a substantial reduction in character error rateโ€”from 52.54% to 17.59%โ€”using only limited target-language data, matching the performance of the Whisper-Small multilingual model. This work establishes the first ASR benchmark for Newari that preserves the native Devanagari script and demonstrates the efficacy and efficiency of proximate cross-lingual transfer within the South Asian linguistic context.

Technology Category

Application Category

๐Ÿ“ Abstract
Nepal Bhasha (Newari), an endangered language of the Kathmandu Valley, remains digitally marginalized due to the severe scarcity of annotated speech resources. In this work, we introduce Nw\=ach\=a Mun\=a, a newly curated 5.39-hour manually transcribed Devanagari speech corpus for Nepal Bhasha, and establish the first benchmark using script-preserving acoustic modeling. We investigate whether proximal cross-lingual transfer from a geographically and linguistically adjacent language (Nepali) can rival large-scale multilingual pretraining in an ultra-low-resource Automatic Speech Recognition (ASR) setting. Fine-tuning a Nepali Conformer model reduces the Character Error Rate (CER) from a 52.54% zero-shot baseline to 17.59% with data augmentation, effectively matching the performance of the multilingual Whisper-Small model despite utilizing significantly fewer parameters. Our findings demonstrate that proximal transfer within South Asian language clusters serves as a computationally efficient alternative to massive multilingual models. We openly release the dataset and benchmarks to digitally enable the Newari community and foster further research in Nepal Bhasha.
Problem

Research questions and friction points this paper is trying to address.

Nepal Bhasha
endangered language
low-resource ASR
speech corpus
digital marginalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

proximal transfer
low-resource ASR
Devanagari speech corpus
endangered language
Conformer
๐Ÿ”Ž Similar Papers
No similar papers found.
R
Rishikesh Kumar Sharma
Information and Language Processing Research Lab, Kathmandu University, Dhulikhel, Nepal
S
Safal Narshing Shrestha
Information and Language Processing Research Lab, Kathmandu University, Dhulikhel, Nepal
J
Jenny Poudel
Information and Language Processing Research Lab, Kathmandu University, Dhulikhel, Nepal
R
Rupak Tiwari
Information and Language Processing Research Lab, Kathmandu University, Dhulikhel, Nepal
A
Arju Shrestha
Information and Language Processing Research Lab, Kathmandu University, Dhulikhel, Nepal
R
Rupak Raj Ghimire
Information and Language Processing Research Lab, Kathmandu University, Dhulikhel, Nepal
Bal Krishna Bal
Bal Krishna Bal
Professor of Computer Engineering, Kathmandu University
Natural Language ProcessingSentiment AnalysisSoftware Localization