Sheffield’s Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages

📅 2023-06-16

🏛️ AMERICASNLP

📈 Citations: 6

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This paper addresses the Spanish-to-11 low-resource Indigenous American languages (e.g., Aymara, Guarani, Quechua) machine translation task in the AmericasNLP 2023 shared task. We systematically adapt the NLLB-200 multilingual model to this scenario via cross-lingual data augmentation and domain-adaptive fine-tuning, integrating heterogeneous parallel corpora—including constitutions, news, and manuals. Within a supervised fine-tuning framework, we optimize directly for chrF. Our approach achieves state-of-the-art performance: top average chrF on the test set, first place on four individual languages, and top-three rankings across all eleven target languages. Key contributions include the first systematic extension of NLLB-200 to Indigenous American languages and empirical validation that multi-source data fusion and domain adaptation significantly improve translation quality in extremely low-resource settings.

📝 Abstract

The University of Sheffield took part in the shared task 2023 AmericasNLP for all eleven language pairs. Our models consist of training different variations of NLLB-200 model on data provided by the organizers and available data from various sources such as constitutions, handbooks and news articles. Our models outperform the baseline model on the development set on chrF with substantial improvements particularly for Aymara, Guarani and Quechua. On the test set, our best submission achieves the highest average chrF of all the submissions, we rank first in four of the eleven languages, and at least one of our models ranks in the top 3 for all languages.

Problem

Research questions and friction points this paper is trying to address.

Machine Translation into Indigenous Languages

Spanish to eleven indigenous languages

NLLB-200 model enhancement and ensembling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extended NLLB-200 variations

Used diverse data sources

Achieved top chrF scores

🔎 Similar Papers

No similar papers found.