🤖 AI Summary
This work addresses the limitation of current large language models, which, despite generating fluent explanations, often fail to effectively support learners’ metacognitive processes—such as planning, monitoring, and evaluation—and risk disrupting adaptive learning through excessive intervention. To tackle this, the authors propose MetaCLASS, a novel framework that formalizes metacognitive tutoring as a selection task among 11 interpretable instructional actions. MetaCLASS employs a two-stage architecture: first, it plans an instructional trajectory based on the learner’s profile, then generates naturalistic dialogue aligned with that trajectory. The study introduces the first high-quality dataset comprising 1,015 annotated dialogues (7,711 turns) with turn-level metacognitive labels. Benchmark evaluations reveal that even the best-performing model achieves only 43.2% accuracy in action prediction and intervenes unnecessarily in 95.8% of situations where silence is optimal, highlighting a significant intervention bias.
📝 Abstract
Large language models can generate fluent explanations, but effective tutoring requires supporting the learner's thought process, not just delivering content. Metacognitive tutoring targets this gap by prompting planning, monitoring, debugging, and evaluation, and crucially, deciding when to be active versus minimally present, based on learner signals and trajectory. We introduce MetaCLASS, a learning-science grounded framework that formulates metacognitive tutoring as move selection over 11 interpretable actions aligned to self-regulated learning processes. MetaCLASS uses a two-phase framework that first plans a pedagogical trajectory conditioned on learner profiles (calibration, help-seeking) and then generates natural dialogue consistent with that plan. This yields a dataset of 1,015 conversations (7,711 turns) annotated with turn-level metacognitive labels, and validated for pedagogical contingency and trajectory adherence. We benchmark nine LLMs on predicting the next coach move given the problem and dialogue context. The best model achieves only 43.2\% accuracy, and models exhibit compulsive intervention bias: in turns where effective metacognitive tutoring requires silent (41.7\% of cases), models predict `no intervention'only 4.2\% of the time, while severely over-predicting high-intervention moves. These results show that traditional content-based tutoring ability does not translate to metacognitive tutoring competence, positioning MetaCLASS as a testbed for developing intelligent tutors that promote self-regulated learning.