🤖 AI Summary
This work addresses the challenge that current large language models struggle to maintain a stable novice-level knowledge state when simulating learners, often compromising pedagogical authenticity by inadvertently revealing expert knowledge. To overcome this limitation, the study introduces a novel approach that integrates machine unlearning with relearning: the model is first subjected to targeted unlearning to regress it to a genuine beginner state, and its subsequent knowledge reconstruction is observed within a structured instructional setting. The methodology combines machine unlearning, fine-tuning, multi-turn dialogue analysis, and a Python programming multiple-choice question dataset. Experimental results demonstrate that the unlearned agent significantly outperforms prompt-engineering baselines, exhibiting more authentic novice behaviors and identifiable patterns of conceptual change during tutoring interactions, thereby establishing a new paradigm for AI-driven learner modeling.
📝 Abstract
Student simulation can support learning-by-teaching pedagogy where human students (as tutors) teach AI-simulated novice students (as tutees). Recent research often relies on prompt engineering with large language models (LLMs) to simulate novice student behaviour, but it is difficult to keep the AI-simulated student at a stable novice knowledge level. A key reason is that many LLMs are trained to be broadly capable, so even when prompted to "act like a novice," the LLMs can still produce expert-level explanations during the learning-by-teaching interaction process. As a result, the AI-simulated student may drift beyond the intended knowledge level, reducing the credibility of the simulation for studying learning-by-teaching processes. Thus, we propose a knowledge-level simulation approach based on machine unlearning. We investigate this approach using a dataset of multiple-choice questions on Python programming concepts. We apply machine unlearning to transform a knowledgeable LLM into a novice-level AI student (i.e., teachable agent), then evaluate whether the teachable agent can relearn targeted knowledge components through learning-by-teaching dialogue interactions. Finally, we analyse the dialogue logs to characterise how the agent's behaviour changes over time, including its question asking, error patterns, and responsiveness to instruction. The results show that (1) unlearning produces simulated student agents with more novice-like responses than prompt-only baselines, (2) the agents recover a measurable portion of the unlearned knowledge under structured exposure, and (3) dialogue analyses reveal identifiable trajectories of conceptual change and teaching moves that predict learning recovery.