State-of-the-Art Translation of Text-to-Gloss using mBART : A case study of Bangla

📅 2025-04-03

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

To address the longstanding scarcity of data and research in Bangla-to-gloss (sign language morpheme) translation, this work introduces the first high-quality Bangla→gloss parallel dataset. We propose a novel data augmentation paradigm combining rule-driven synthetic generation with LLM-enhanced back-translation. Furthermore, we observe that the masking and sequence shuffling mechanisms inherent in mBART-50 pretraining naturally align with gloss’s discrete, syntax-free nature—motivating a new fine-tuning strategy tailored to gloss translation. Experiments show our model achieves a ScareBLEU score of 79.53 on our curated Bangla dataset and outperforms all prior state-of-the-art methods on the PHOENIX-14T benchmark (ScareBLEU = 63.89, COMET = 0.624), leading across all six evaluation metrics. This is the first successful Bangla-to-gloss translation system, demonstrating strong cross-lingual generalization and establishing a reusable data curation and modeling framework for low-resource sign language translation.

Technology Category

Application Category

📝 Abstract

Despite a large deaf and dumb population of 1.7 million, Bangla Sign Language (BdSL) remains a understudied domain. Specifically, there are no works on Bangla text-to-gloss translation task. To address this gap, we begin by addressing the dataset problem. We take inspiration from grammatical rule based gloss generation used in Germany and American sign langauage (ASL) and adapt it for BdSL. We also leverage LLM to generate synthetic data and use back-translation, text generation for data augmentation. With dataset prepared, we started experimentation. We fine-tuned pretrained mBART-50 and mBERT-multiclass-uncased model on our dataset. We also trained GRU, RNN and a novel seq-to-seq model with multi-head attention. We observe significant high performance (ScareBLEU=79.53) with fine-tuning pretrained mBART-50 multilingual model from Facebook. We then explored why we observe such high performance with mBART. We soon notice an interesting property of mBART -- it was trained on shuffled and masked text data. And as we know, gloss form has shuffling property. So we hypothesize that mBART is inherently good at text-to-gloss tasks. To find support against this hypothesis, we trained mBART-50 on PHOENIX-14T benchmark and evaluated it with existing literature. Our mBART-50 finetune demonstrated State-of-the-Art performance on PHOENIX-14T benchmark, far outperforming existing models in all 6 metrics (ScareBLEU = 63.89, BLEU-1 = 55.14, BLEU-2 = 38.07, BLEU-3 = 27.13, BLEU-4 = 20.68, COMET = 0.624). Based on the results, this study proposes a new paradigm for text-to-gloss task using mBART models. Additionally, our results show that BdSL text-to-gloss task can greatly benefit from rule-based synthetic dataset.

Problem

Research questions and friction points this paper is trying to address.

Addressing lack of Bangla text-to-gloss translation research

Developing synthetic datasets for Bangla Sign Language (BdSL)

Proposing mBART as optimal model for text-to-gloss tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned mBART-50 for text-to-gloss translation

Used rule-based synthetic data generation

Leveraged multi-head attention seq-to-seq model

🔎 Similar Papers

No similar papers found.