🤖 AI Summary
Educational chatbots (PCAs) lack fine-grained validation mechanisms tailored to students’ individual knowledge levels and motivational traits. Method: This paper introduces the first controllable student simulation framework powered by large language models (LLMs). It parametrically models student profiles—integrating knowledge proficiency and motivational characteristics—and leverages prompt engineering, behavior-aligned fine-tuning, and automated multi-turn dialogue simulation to generate high-fidelity student responses. Results: Experiments show reconstruction errors of ≤5% for knowledge level and ≤10% for motivational traits. Teacher evaluations confirm substantial reduction in manual testing effort and significantly improved coverage of student profile diversity. This work establishes a configurable, reproducible, and scalable automated validation paradigm for adaptive PCA assessment.
📝 Abstract
Large language models (LLMs) can empower teachers to build pedagogical conversational agents (PCAs) customized for their students. As students have different prior knowledge and motivation levels, teachers must review the adaptivity of their PCAs to diverse students. Existing chatbot reviewing methods (e.g., direct chat and benchmarks) are either manually intensive for multiple iterations or limited to testing only single-turn interactions. We present TeachTune, where teachers can create simulated students and review PCAs by observing automated chats between PCAs and simulated students. Our technical pipeline instructs an LLM-based student to simulate prescribed knowledge levels and traits, helping teachers explore diverse conversation patterns. Our pipeline could produce simulated students whose behaviors correlate highly to their input knowledge and motivation levels within 5% and 10% accuracy gaps. Thirty science teachers designed PCAs in a between-subjects study, and using TeachTune resulted in a lower task load and higher student profile coverage over a baseline.