SingaKids: A Multilingual Multimodal Dialogic Tutor for Language Learning

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
How can generative AI be made stable, age-appropriate, and pedagogically effective for multilingual, multimodal educational interactions with children? This paper introduces SingaKids—the first conversational tutoring system designed specifically for school-aged children, supporting four languages (English, Mandarin, Malay, Tamil). It integrates dense image captioning, multimodal fusion (speech, vision, text), multilingual large language models, and pedagogically grounded instructional scaffolding. Designed to support both language acquisition and cognitive development, SingaKids employs simplified instructions, culturally attuned feedback, and progressive interaction patterns. Technically, it unifies multilingual pretraining, task-specific fine-tuning, ASR/TTS pipelines, and vision–language joint modeling. Empirical evaluation in primary classrooms demonstrates statistically significant improvements in language proficiency across all language groups, robust cross-lingual generalization, and sustained high child engagement. This work establishes a novel paradigm for trustworthy, scalable, and multilingual AI-driven education tailored to young learners.

Technology Category

Application Category

📝 Abstract
The integration of generative artificial intelligence into educational applications has enhanced personalized and interactive learning experiences, and it shows strong potential to promote young learners language acquisition. However, it is still challenging to ensure consistent and robust performance across different languages and cultural contexts, and kids-friendly design requires simplified instructions, engaging interactions, and age-appropriate scaffolding to maintain motivation and optimize learning outcomes. In this work, we introduce SingaKids, a dialogic tutor designed to facilitate language learning through picture description tasks. Our system integrates dense image captioning, multilingual dialogic interaction, speech understanding, and engaging speech generation to create an immersive learning environment in four languages: English, Mandarin, Malay, and Tamil. We further improve the system through multilingual pre-training, task-specific tuning, and scaffolding optimization. Empirical studies with elementary school students demonstrate that SingaKids provides effective dialogic teaching, benefiting learners at different performance levels.
Problem

Research questions and friction points this paper is trying to address.

Ensuring consistent multilingual performance in educational AI
Designing engaging kid-friendly language learning interactions
Integrating multimodal features for immersive language tutoring
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dense image captioning for visual learning
Multilingual dialogic interaction for engagement
Speech understanding and generation integration
🔎 Similar Papers
No similar papers found.