🤖 AI Summary
Indigenous languages in Argentina face severe endangerment due to the absence of systematic language resource repositories and linguistically appropriate computational tools.
Method: This study introduces the first nationally comprehensive classification framework and digital resource atlas covering seven language families and over 30 indigenous languages. It integrates demographic data, NLP resources, and speech corpora to conduct cross-regional dialectal resource assessment, employing linguistic typology, metadata standardization, systematic literature review, and demographic analysis.
Contribution/Results: We deliver Argentina’s first national inventory of indigenous language resources, precisely identifying critical gaps in speech recognition, tokenization, and lexicography. Based on quantitative and qualitative evaluation, we propose a tiered prioritization schema for resource development. The atlas establishes a scalable, empirically grounded methodology for endangered language documentation, digital preservation, and cultural revitalization—bridging linguistic scholarship with computational infrastructure for under-resourced languages.
📝 Abstract
Argentina has a diverse, yet little-known, Indigenous language heritage. Most of these languages are at risk of disappearing, resulting in a significant loss of world heritage and cultural knowledge. Currently, no unified information on speakers and computational tools is available for these languages. In this work, we present a systematization of the Indigenous languages spoken in Argentina, along with national demographic data on the country's Indigenous population. The languages are classified into seven families: Mapuche, Tup'i-Guaran'i, Guaycur'u, Quechua, Mataco-Mataguaya, Aymara, and Chon. We also provide an introductory survey of the computational resources available for these languages, whether or not they are specifically developed for Argentine varieties.