🤖 AI Summary
Indonesia’s 700+ indigenous languages remain severely underrepresented in NLP, yet the genuine technological needs of their speech communities have long been empirically uncharacterized—leading to high development costs and poor adaptability. This study presents the first nationwide, multilingual empirical survey across Indonesia’s diverse language communities, integrating structured questionnaires, stratified sampling, in-depth interviews, and quantitative demand-prioritization analysis. Results identify machine translation and information retrieval as the highest-priority applications; while public enthusiasm for AI is strong, trust remains low, with privacy, algorithmic bias, and data transparency emerging as critical governance concerns. The study proposes a “demand-driven + ethics-first” framework for localized language technology development. It fills a key empirical gap in low-resource language technology needs assessment and offers a reproducible methodological paradigm for multilingual AI governance worldwide.
📝 Abstract
There is an emerging effort to develop NLP for Indonesias 700+ local languages, but progress remains costly due to the need for direct engagement with native speakers. However, it is unclear what these language communities truly need from language technology. To address this, we conduct a nationwide survey to assess the actual needs of native speakers in Indonesia. Our findings indicate that addressing language barriers, particularly through machine translation and information retrieval, is the most critical priority. Although there is strong enthusiasm for advancements in language technology, concerns around privacy, bias, and the use of public data for AI training highlight the need for greater transparency and clear communication to support broader AI adoption.