Classifying several dialectal Nawatl varieties

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This study addresses the challenges posed by the scarcity of digital resources and the rich dialectal variation of Nahuatl, an endangered Indigenous language of Mexico, which have significantly hindered computational linguistic research. For the first time, modern machine learning and neural network approaches are systematically applied to the task of automatic dialect classification for Nahuatl. Leveraging limited textual data, the authors develop and train models capable of effectively distinguishing among multiple dialectal variants. This work not only fills a critical gap in the computational treatment of low-resource Indigenous languages but also establishes a reproducible technical framework that can inform digital preservation efforts for other endangered languages facing similar resource constraints.

Technology Category

Application Category

📝 Abstract

Mexico is a country with a large number of indigenous languages, among which the most widely spoken is Nawatl, with more than two million people currently speaking it (mainly in North and Central America). Despite its rich cultural heritage, which dates back to the 15th century, Nawatl is a language with few computer resources. The problem is compounded when it comes to its dialectal varieties, with approximately 30 varieties recognised, not counting the different spellings in the written forms of the language. In this research work, we addressed the problem of classifying Nawatl varieties using Machine Learning and Neural Networks.

Problem

Research questions and friction points this paper is trying to address.

Nawatl

dialect classification

indigenous languages

language varieties

low-resource languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nawatl dialect classification

low-resource language

machine learning