🤖 AI Summary
Novice mathematics teachers often struggle to accurately diagnose and effectively address students’ misconceptions, while existing large language models typically generate instructional feedback that lacks strong grounding in pedagogical knowledge and authentic error examples, resulting in overly generic and impractical guidance. To address this gap, this work proposes MisEdu-RAG, a novel framework that introduces a dual-layer structure comprising a concept hypergraph and an instance hypergraph. Through a two-stage retrieval mechanism, the framework explicitly links pedagogical principles with real student errors to generate specific and actionable feedback. Experiments on the MisstepMath dataset demonstrate a 10.95% improvement in Token-F1 and up to a 15.3% gain across five dimensions of response quality. Teacher surveys and interviews further confirm the framework’s significant advantages in diagnostic accuracy and instructional utility.
📝 Abstract
Novice math teachers often encounter students' mistakes that are difficult to diagnose and remediate. Misconceptions are especially challenging because teachers must explain what went wrong and how to solve them. Although many existing large language model (LLM) platforms can assist in generating instructional feedback, these LLMs loosely connect pedagogical knowledge and student mistakes, which might make the guidance less actionable for teachers. To address this gap, we propose MisEdu-RAG, a dual-hypergraph-based retrieval-augmented generation (RAG) framework that organizes pedagogical knowledge as a concept hypergraph and real student mistake cases as an instance hypergraph. Given a query, MisEdu-RAG performs a two-stage retrieval to gather connected evidence from both layers and generates a response grounded in the retrieved cases and pedagogical principles. We evaluate on \textit{MisstepMath}, a dataset of math mistakes paired with teacher solutions, as a benchmark for misconception-aware retrieval and response generation across topics and error types. Evaluation results on \textit{MisstepMath} show that, compared with baseline models, MisEdu-RAG improves token-F1 by 10.95\% and yields up to 15.3\% higher five-dimension response quality, with the largest gains on \textit{Diversity} and \textit{Empowerment}. To verify its applicability in practical use, we further conduct a pilot study through a questionnaire survey of 221 teachers and interviews with 6 novices. The findings suggest that MisEdu-RAG provides diagnosis results and concrete teaching moves for high-demand misconception scenarios. Overall, MisEdu-RAG demonstrates strong potential for scalable teacher training and AI-assisted instruction for misconception handling. Our code is available on GitHub: https://github.com/GEMLab-HKU/MisEdu-RAG.