๐ค AI Summary
Current large language models (LLMs) lack explicit modeling of authentic student learning processes, limiting their capacity for high-quality personalized instruction. To address this, we propose a parameter-efficient fine-tuning framework grounded in real-world one-on-one teacherโstudent dialogue data, yielding an education-specific LLM with pedagogical awareness. Our approach introduces a high-fidelity synthetic dialogue generation paradigm, leveraging anonymized large-scale classroom interaction data to model student behaviors; and establishes an automated multi-turn pedagogical dialogue evaluation protocol, overcoming the limitations of prompt engineering in capturing complex instructional strategies. Experiments demonstrate substantial improvements in teaching interaction quality: student speaking duration doubles, question diversity increases, and dialogue turn count rises by 50%, enabling more natural, cognitively adaptive responses aligned with individual learning rhythms.
๐ Abstract
The promise of generative AI to revolutionize education is constrained by the pedagogical limits of large language models (LLMs). A major issue is the lack of access to high-quality training data that reflect the learning of actual students. Prompt engineering has emerged as a stopgap, but the ability of prompts to encode complex pedagogical strategies in rule-based natural language is inherently limited. To address this gap we introduce TeachLM - an LLM optimized for teaching through parameter-efficient fine-tuning of state-of-the-art models. TeachLM is trained on a dataset comprised of 100,000 hours of one-on-one, longitudinal student-tutor interactions maintained by Polygence, which underwent a rigorous anonymization process to protect privacy. We use parameter-efficient fine-tuning to develop an authentic student model that enables the generation of high-fidelity synthetic student-tutor dialogues. Building on this capability, we propose a novel multi-turn evaluation protocol that leverages synthetic dialogue generation to provide fast, scalable, and reproducible assessments of the dialogical capabilities of LLMs. Our evaluations demonstrate that fine-tuning on authentic learning data significantly improves conversational and pedagogical performance - doubling student talk time, improving questioning style, increasing dialogue turns by 50%, and greater personalization of instruction.