🤖 AI Summary
Existing automated math problem generation methods neglect pedagogical intent, support only unidimensional objectives, and suffer from misalignment between textual quality and educational appropriateness. To address this, we propose EQPR, an educational-goal-driven problem generation framework introducing the novel “Plan–Evaluate–Optimize” paradigm. Methodologically, EQPR integrates Monte Carlo Tree Search for education-goal-guided structured problem planning and leverages large language models for iterative self-reflective generation. Our contributions include: (1) EduMath—the first large-scale dataset comprising 16K problems annotated with fine-grained, three-dimensional educational goals (cognitive level, knowledge concept, and difficulty); and (2) EQGEVAL—a multidimensional alignment evaluation benchmark. Experiments demonstrate that EQPR significantly improves goal alignment on EQGEVAL and generates problems better aligned with instructional context and foundational pedagogical intentions.
📝 Abstract
Automatically generating high-quality mathematical problems that align with educational objectives is a crucial task in NLP-based educational technology. Traditional generation methods focus primarily on textual quality, but they often overlook educational objectives. Moreover, these methods address only single-dimensional, simple question generation, failing to meet complex, multifaceted educational requirements. To address these challenges, we constructed and annotated EduMath, a dataset of 16k mathematical questions with multi-dimensional educational objectives. Based on this dataset, we developed EQGEVAL, which incorporates three evaluation dimensions and is designed to assess the ability of models to generate educational questions. Drawing inspiration from teachers' problem design processes, we propose the Educational Question Planning with self-Reflection (EQPR) method for educational mathematical question generation, following a"plan-evaluate-optimize"approach. Specifically, by combining planning algorithm based on Monte Carlo Tree Search with the generative capabilities of Large Language Models, we continuously optimize questions through iterative feedback. This self-optimization mechanism ensures that the generated questions both fit the educational context and strategically achieve specific basic educational objectives. Through extensive experiments based on EQGEVAL, we have demonstrated that EQPR achieves significant improvements in generating questions that meet multi-dimensional educational objectives.