Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work proposes PedagogicalRL-Thinking, a novel framework that integrates educational theory into the internal reasoning mechanisms of large language models (LLMs), addressing the common oversight in existing approaches that prioritize output correctness over pedagogically sound reasoning processes. By incorporating instruction-guided reasoning prompts and a reinforcement learning–based reward mechanism tailored to teaching principles, the framework jointly fine-tunes LLMs to optimize their reasoning trajectories. Evaluated on math tutoring tasks, the resulting models not only achieve significant performance gains on unseen educational benchmarks but also retain their original factual knowledge while generating reasoning steps that exhibit greater pedagogical structure and logical coherence.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly deployed as intelligent tutoring systems, yet research on optimizing LLMs specifically for educational contexts remains limited. Recent works have proposed reinforcement learning approaches for training LLM tutors, but these methods focus solely on optimizing visible responses while neglecting the model's internal thinking process. We introduce PedagogicalRL-Thinking, a framework that extends pedagogical alignment to reasoning LLMs in education through two novel approaches: (1) Pedagogical Reasoning Prompting, which guides internal reasoning using domain-specific educational theory rather than generic instructions; and (2) Thinking Reward, which explicitly evaluates and reinforces the pedagogical quality of the model's reasoning traces. Our experiments reveal that domain-specific, theory-grounded prompting outperforms generic prompting, and that Thinking Reward is most effective when combined with pedagogical prompting. Furthermore, models trained only on mathematics tutoring dialogues show improved performance on educational benchmarks not seen during training, while preserving the base model's factual knowledge. Our quantitative and qualitative analyses reveal that pedagogical thinking reward produces systematic reasoning trace changes, with increased pedagogical reasoning and more structured instructional decision-making in the tutor's thinking process.

Problem

Research questions and friction points this paper is trying to address.

pedagogical reasoning

thinking process

large language models

intelligent tutoring systems

educational alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pedagogical Reasoning Prompting

Thinking Reward

Reinforcement Learning for LLMs