🤖 AI Summary
This study investigates the efficacy and dependency risks of generative AI (LLM-Tutor) in learning mathematical proof. Grounded in learning sciences, we designed an iterative AI tutoring system integrating LLM-driven automated proof grading and interactive mathematical Q&A. A mixed-methods approach—comprising controlled experiments, mediation analysis, and qualitative interviews—was employed to evaluate pedagogical impact. Results indicate significant improvements in homework performance and final exam scores; however, no statistically significant gains were observed in overall summative assessment outcomes. Notably, students with lower self-efficacy engaged more frequently with the tool, and their reliance exhibited a negative moderating effect on learning outcomes. The key contribution lies in the first empirical integration of generative AI with self-efficacy theory, revealing how perceived competence shapes AI usage patterns and consequent learning trajectories—thereby informing differentiated design principles and ethically grounded deployment of educational AI.
📝 Abstract
We evaluate the effectiveness of LLM-Tutor, a large language model (LLM)-powered tutoring system that combines an AI-based proof-review tutor for real-time feedback on proof-writing and a chatbot for mathematics-related queries. Our experiment, involving 148 students, demonstrated that the use of LLM-Tutor significantly improved homework performance compared to a control group without access to the system. However, its impact on exam performance and time spent on tasks was found to be insignificant. Mediation analysis revealed that students with lower self-efficacy tended to use the chatbot more frequently, which partially contributed to lower midterm scores. Furthermore, students with lower self-efficacy were more likely to engage frequently with the proof-review-AI-tutor, a usage pattern that positively contributed to higher final exam scores. Interviews with 19 students highlighted the accessibility of LLM-Tutor and its effectiveness in addressing learning needs, while also revealing limitations and concerns regarding potential over-reliance on the tool. Our results suggest that generative AI alone like chatbot may not suffice for comprehensive learning support, underscoring the need for iterative design improvements with learning sciences principles with generative AI educational tools like LLM-Tutor.