🤖 AI Summary
This study addresses the challenges posed by the scarcity of digital resources and the rich dialectal variation of Nahuatl, an endangered Indigenous language of Mexico, which have significantly hindered computational linguistic research. For the first time, modern machine learning and neural network approaches are systematically applied to the task of automatic dialect classification for Nahuatl. Leveraging limited textual data, the authors develop and train models capable of effectively distinguishing among multiple dialectal variants. This work not only fills a critical gap in the computational treatment of low-resource Indigenous languages but also establishes a reproducible technical framework that can inform digital preservation efforts for other endangered languages facing similar resource constraints.
📝 Abstract
Mexico is a country with a large number of indigenous languages, among which the most widely spoken is Nawatl, with more than two million people currently speaking it (mainly in North and Central America). Despite its rich cultural heritage, which dates back to the 15th century, Nawatl is a language with few computer resources. The problem is compounded when it comes to its dialectal varieties, with approximately 30 varieties recognised, not counting the different spellings in the written forms of the language. In this research work, we addressed the problem of classifying Nawatl varieties using Machine Learning and Neural Networks.