🤖 AI Summary
This work addresses the marginalization of oral-dominant Indigenous languages—such as Guaraní, an official language of Paraguay—in mainstream AI systems, which are overwhelmingly text-centric and thus inadequately support authentic spoken interaction practices. To overcome this limitation, the paper proposes a speech-first multi-agent architecture that eschews conventional text-to-speech pipelines. Instead, it decouples language understanding, dialogue state management, and community-led governance mechanisms while explicitly modeling turn-taking, repair strategies, and shared context. By treating spoken interaction itself as a first-class design element in AI systems, the framework foregrounds cultural embeddedness and data sovereignty, challenges the hegemony of text-centered paradigms, and respects linguistic diglossia and Indigenous knowledge systems, thereby offering a viable pathway toward genuinely culturally responsive AI.
📝 Abstract
Although artificial intelligence (AI) and Human-Computer Interaction (HCI) systems are often presented as universal solutions, their design remains predominantly text-first, underserving primarily oral languages and indigenous communities. This position paper uses Guaran\'i, an official and widely spoken language of Paraguay, as a case study to argue that language support in AI remains insufficient unless it aligns with lived oral practices. We propose an alternative to the standard"text-to-speech"pipeline, proposing instead an oral-first multi-agent architecture. By decoupling Guaran\'i natural language understanding from dedicated agents for conversation state and community-led governance, we demonstrate a technical framework that respects indigenous data sovereignty and diglossia. Our work moves beyond mere recognition to focus on turn-taking, repair, and shared context as the primary locus of interaction. We conclude that for AI to be truly culturally grounded, it must shift from adapting oral languages to text-centric systems to treating spoken conversation as a first-class design requirement, ensuring digital ecosystems empower rather than overlook diverse linguistic practices.