🤖 AI Summary
This study addresses the gap in formative feedback research for mathematical reasoning instruction in non-English, multilingual educational contexts. Method: We propose a novel LLM-to-LLM pedagogical simulation paradigm: strong LLMs act as “teachers” generating linguistically adapted prompts, while weaker LLMs serve as “students” performing step-by-step reasoning—covering 11 languages, including seven low-resource ones. Through 352 controlled experiments, cross-lingual input–feedback pairing, and standardized evaluation, we assess instructional efficacy. Contribution/Results: Native-language-aligned feedback significantly enhances learning outcomes, yielding an average 19.3% accuracy gain in low-resource language settings. We further introduce the first analytical framework jointly modeling linguistic properties, model capabilities, and prompting strategies, empirically confirming that language resource availability and model–task alignment are critical moderators of educational effectiveness.
📝 Abstract
Large language models (LLMs) have demonstrated the ability to generate formative feedback and instructional hints in English, making them increasingly relevant for AI-assisted education. However, their ability to provide effective instructional support across different languages, especially for mathematically grounded reasoning tasks, remains largely unexamined. In this work, we present the first large-scale simulation of multilingual tutor-student interactions using LLMs. A stronger model plays the role of the tutor, generating feedback in the form of hints, while a weaker model simulates the student. We explore 352 experimental settings across 11 typologically diverse languages, four state-of-the-art LLMs, and multiple prompting strategies to assess whether language-specific feedback leads to measurable learning gains. Our study examines how student input language, teacher feedback language, model choice, and language resource level jointly influence performance. Results show that multilingual hints can significantly improve learning outcomes, particularly in low-resource languages when feedback is aligned with the student's native language. These findings offer practical insights for developing multilingual, LLM-based educational tools that are both effective and inclusive.