🤖 AI Summary
This study addresses the low efficiency and uneven cognitive-level coverage in teacher-authored test item generation. We propose a novel test item auto-generation method integrating Bloom’s Taxonomy with large language models (LLMs), employing structured prompt engineering to explicitly encode Bloom’s six cognitive levels into the LLM generation process—enabling on-demand synthesis of multi-level, multi-format items. A rigorous empirical evaluation was conducted with frontline teachers, incorporating reliability analysis, item difficulty distribution assessment, and cognitive-level coverage evaluation. Results demonstrate that the generated items match or surpass human-authored items in reliability, difficulty appropriateness, and Bloom-level coverage; furthermore, teacher adoption intent is high, indicating strong potential for scalable classroom deployment. To our knowledge, this work represents the first effort to achieve deep, structured integration of Bloom’s Taxonomy into LLM-based test generation, accompanied by comprehensive pedagogical validation.
📝 Abstract
Question generation (QG) is a natural language processing task with an abundance of potential benefits and use cases in the educational domain. In order for this potential to be realized, QG systems must be designed and validated with pedagogical needs in mind. However, little research has assessed or designed QG approaches with the input of real teachers or students. This paper applies a large language model-based QG approach where questions are generated with learning goals derived from Bloom's taxonomy. The automatically generated questions are used in multiple experiments designed to assess how teachers use them in practice. The results demonstrate that teachers prefer to write quizzes with automatically generated questions, and that such quizzes have no loss in quality compared to handwritten versions. Further, several metrics indicate that automatically generated questions can even improve the quality of the quizzes created, showing the promise for large scale use of QG in the classroom setting.