Reconstructing Syllable Sequences in Abugida Scripts with Incomplete Inputs

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses syllable sequence reconstruction under incomplete input for abugida scripts—Bengali, Hindi, Khmer, Lao, Burmese, and Thai—considering four missingness patterns: consonant sequences, vowel sequences, random character deletion, and fixed-syllable masking. We propose a multilingual Transformer-based sequence-to-sequence model jointly trained on the Asian Language Treebank (ALT). To our knowledge, this is the first systematic cross-lingual evaluation of reconstruction performance across diverse abugida writing systems and missingness types. Results show that consonant sequences serve as the strongest predictive cue, whereas vowel recovery exhibits inherent structural challenges. The model achieves high BLEU scores on consonant-driven tasks and demonstrates robustness in partial and masked-syllable reconstruction. These findings provide a transferable, practical foundation for low-resource text prediction, spelling correction, and data augmentation in abugida languages.

Technology Category

Application Category

📝 Abstract
This paper explores syllable sequence prediction in Abugida languages using Transformer-based models, focusing on six languages: Bengali, Hindi, Khmer, Lao, Myanmar, and Thai, from the Asian Language Treebank (ALT) dataset. We investigate the reconstruction of complete syllable sequences from various incomplete input types, including consonant sequences, vowel sequences, partial syllables (with random character deletions), and masked syllables (with fixed syllable deletions). Our experiments reveal that consonant sequences play a critical role in accurate syllable prediction, achieving high BLEU scores, while vowel sequences present a significantly greater challenge. The model demonstrates robust performance across tasks, particularly in handling partial and masked syllable reconstruction, with strong results for tasks involving consonant information and syllable masking. This study advances the understanding of sequence prediction for Abugida languages and provides practical insights for applications such as text prediction, spelling correction, and data augmentation in these scripts.
Problem

Research questions and friction points this paper is trying to address.

Predicting syllable sequences in Abugida languages with incomplete inputs
Reconstructing sequences from consonant, vowel, partial, or masked inputs
Evaluating Transformer models for text prediction in six Asian languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based models for Abugida syllable prediction
Handling incomplete inputs like consonant and vowel sequences
Robust performance in partial and masked syllable reconstruction
🔎 Similar Papers
No similar papers found.