π€ AI Summary
This study addresses the challenge of maintaining stable behavioral consistency in large language models (LLMs) when simulating students with attention-deficit/hyperactivity disorder (ADHD) over extended educational interactions. To quantify temporal stability both within and across sessions, the authors introduce a dual-evaluation framework that integrates self-reported traits with observer-based assessments. Through large-scale experiments involving 8,920 dialogue turns across five LLMs, three prompting strategies, and clinically informed role specifications, they demonstrate significant behavioral drift under unstructured interaction conditions. Notably, structured task-oriented prompts entirely mitigate this drift, ensuring robust role fidelity. These findings establish critical design principles for developing trustworthy, consistent agent-based simulations in educational contexts.
π Abstract
Student simulation with Large language models (LLMs) offers a scalable alternative for educational research and teacher training. Yet, its validity depends on whether models maintain stable personas across extended interactions. We test this prerequisite using a dual-assessment framework measuring self-reported characteristics and observer-rated behavioral expressions. Across two experiments testing four clinically-grounded ADHD persona conditions, five LLMs, and three prompt designs, we quantify between-conversation stability (N=4,968) and within-conversation stability (N=3,952 across 9 turns). Self-reported characteristics remain stable for high intensities, constituting a necessary prerequisite for valid behavioral simulation. Observer-rated behavioral expression reveals selective instability: within-conversation drift occurs in unscripted dialog for high and moderate ADHD personas. Scripted interactions with explicit task prompts eliminate this drift entirely. Stable, persona-aligned simulated learners benefit from a structured interaction design to maintain behavioral coherence, which holds significant implications for teacher training, adaptive tutoring, and any application requiring sustained, path-dependent learner interactions.