🤖 AI Summary
To address the time-intensive and inefficient manual creation of retrieval practice questions in technical disciplines, this study proposes a pedagogical enhancement method leveraging large language models (LLMs) to automatically generate high-quality multiple-choice questions. Customized prompt engineering and rigorous human validation ensure question validity and reliability. Crucially, this work conducts the first controlled empirical study deploying LLM-generated retrieval questions in an authentic data science course. Results show that students practicing with LLM-generated questions achieved an 89% knowledge retention accuracy—significantly higher than the 73% observed in the no-practice control group (p < 0.01). The primary contribution lies in providing the first empirical evidence—within a natural instructional setting—that LLM-generated retrieval questions enhance long-term memory retention. This advances scalable, sustainable development of intelligent educational resources by establishing both empirical grounding and a practical implementation framework.
📝 Abstract
Retrieval practice is a well-established pedagogical technique known to significantly enhance student learning and knowledge retention. However, generating high-quality retrieval practice questions is often time-consuming and labor intensive for instructors, especially in rapidly evolving technical subjects. Large Language Models (LLMs) offer the potential to automate this process by generating questions in response to prompts, yet the effectiveness of LLM-generated retrieval practice on student learning remains to be established. In this study, we conducted an empirical study involving two college-level data science courses, with approximately 60 students. We compared learning outcomes during one week in which students received LLM-generated multiple-choice retrieval practice questions to those from a week in which no such questions were provided. Results indicate that students exposed to LLM-generated retrieval practice achieved significantly higher knowledge retention, with an average accuracy of 89%, compared to 73% in the week without such practice. These findings suggest that LLM-generated retrieval questions can effectively support student learning and may provide a scalable solution for integrating retrieval practice into real-time teaching. However, despite these encouraging outcomes and the potential time-saving benefits, cautions must be taken, as the quality of LLM-generated questions can vary. Instructors must still manually verify and revise the generated questions before releasing them to students.