Do Large Language Models Mentalize When They Teach?

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) possess theory-of-mind-like capabilities in instructional decision-making—specifically, the ability to reason about learners’ knowledge states—or rely solely on heuristic strategies. By simulating teaching tasks within reward-labeled directed graphs, the authors tasked LLMs with selecting informative edges based on learner trajectories to guide more optimal planning. For the first time, cognitive modeling approaches from human pedagogy research, such as the Bayesian optimal teacher model, were systematically applied to LLMs. Model comparison using Bayesian Information Criterion (BIC) revealed that most LLMs’ teaching behaviors aligned best with the Bayesian optimal model and closely resembled human performance. Although scaffolding prompts modulated response styles, they did not consistently enhance teaching efficacy and occasionally degraded performance. These findings suggest that LLMs may exhibit latent theory-of-mind abilities, while also highlighting that prompt compliance does not necessarily equate to effective teaching.

Technology Category

Application Category

📝 Abstract

How do LLMs decide what to teach next: by reasoning about a learner's knowledge, or by using simpler rules of thumb? We test this in a controlled task previously used to study human teaching strategies. On each trial, a teacher LLM sees a hypothetical learner's trajectory through a reward-annotated directed graph and must reveal a single edge so the learner would choose a better path if they replanned. We run a range of LLMs as simulated teachers and fit their trial-by-trial choices with the same cognitive models used for humans: a Bayes-Optimal teacher that infers which transitions the learner is missing (inverse planning), weaker Bayesian variants, heuristic baselines (e.g., reward based), and non-mentalizing utility models. In a baseline experiment matched to the stimuli presented to human subjects, most LLMs perform well, show little change in strategy over trials, and their graph-by-graph performance is similar to that of humans. Model comparison (BIC) shows that Bayes-Optimal teaching best explains most models' choices. When given a scaffolding intervention, models follow auxiliary inference- or reward-focused prompts, but these scaffolds do not reliably improve later teaching on heuristic-incongruent test graphs and can sometimes reduce performance. Overall, cognitive model fits provide insight into LLM tutoring policies and show that prompt compliance does not guarantee better teaching decisions.

Problem

Research questions and friction points this paper is trying to address.

mentalizing

large language models

teaching strategies

cognitive modeling

Bayesian inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

mentalizing

Bayes-Optimal teaching

cognitive modeling