Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study investigates the efficacy and dependency risks of generative AI (LLM-Tutor) in learning mathematical proof. Grounded in learning sciences, we designed an iterative AI tutoring system integrating LLM-driven automated proof grading and interactive mathematical Q&A. A mixed-methods approach—comprising controlled experiments, mediation analysis, and qualitative interviews—was employed to evaluate pedagogical impact. Results indicate significant improvements in homework performance and final exam scores; however, no statistically significant gains were observed in overall summative assessment outcomes. Notably, students with lower self-efficacy engaged more frequently with the tool, and their reliance exhibited a negative moderating effect on learning outcomes. The key contribution lies in the first empirical integration of generative AI with self-efficacy theory, revealing how perceived competence shapes AI usage patterns and consequent learning trajectories—thereby informing differentiated design principles and ethically grounded deployment of educational AI.

Technology Category

Application Category

📝 Abstract

We evaluate the effectiveness of LLM-Tutor, a large language model (LLM)-powered tutoring system that combines an AI-based proof-review tutor for real-time feedback on proof-writing and a chatbot for mathematics-related queries. Our experiment, involving 148 students, demonstrated that the use of LLM-Tutor significantly improved homework performance compared to a control group without access to the system. However, its impact on exam performance and time spent on tasks was found to be insignificant. Mediation analysis revealed that students with lower self-efficacy tended to use the chatbot more frequently, which partially contributed to lower midterm scores. Furthermore, students with lower self-efficacy were more likely to engage frequently with the proof-review-AI-tutor, a usage pattern that positively contributed to higher final exam scores. Interviews with 19 students highlighted the accessibility of LLM-Tutor and its effectiveness in addressing learning needs, while also revealing limitations and concerns regarding potential over-reliance on the tool. Our results suggest that generative AI alone like chatbot may not suffice for comprehensive learning support, underscoring the need for iterative design improvements with learning sciences principles with generative AI educational tools like LLM-Tutor.

Problem

Research questions and friction points this paper is trying to address.

Evaluating AI tutoring system effectiveness for mathematical proof learning

Assessing impact of proof-review tutor and chatbot on student performance

Investigating how self-efficacy affects AI tool usage and learning outcomes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining AI proof-review tutor with chatbot

Providing real-time feedback on proof-writing

Iterative design with learning sciences principles

🔎 Similar Papers

Autograding Mathematical Induction Proofs with Natural Language Processing