Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

๐Ÿ“… 2026-01-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes PedagogicalRL-Thinking, a novel framework that integrates educational theory into the internal reasoning mechanisms of large language models (LLMs), addressing the common oversight in existing approaches that prioritize output correctness over pedagogically sound reasoning processes. By incorporating instruction-guided reasoning prompts and a reinforcement learningโ€“based reward mechanism tailored to teaching principles, the framework jointly fine-tunes LLMs to optimize their reasoning trajectories. Evaluated on math tutoring tasks, the resulting models not only achieve significant performance gains on unseen educational benchmarks but also retain their original factual knowledge while generating reasoning steps that exhibit greater pedagogical structure and logical coherence.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) are increasingly deployed as intelligent tutoring systems, yet research on optimizing LLMs specifically for educational contexts remains limited. Recent works have proposed reinforcement learning approaches for training LLM tutors, but these methods focus solely on optimizing visible responses while neglecting the model's internal thinking process. We introduce PedagogicalRL-Thinking, a framework that extends pedagogical alignment to reasoning LLMs in education through two novel approaches: (1) Pedagogical Reasoning Prompting, which guides internal reasoning using domain-specific educational theory rather than generic instructions; and (2) Thinking Reward, which explicitly evaluates and reinforces the pedagogical quality of the model's reasoning traces. Our experiments reveal that domain-specific, theory-grounded prompting outperforms generic prompting, and that Thinking Reward is most effective when combined with pedagogical prompting. Furthermore, models trained only on mathematics tutoring dialogues show improved performance on educational benchmarks not seen during training, while preserving the base model's factual knowledge. Our quantitative and qualitative analyses reveal that pedagogical thinking reward produces systematic reasoning trace changes, with increased pedagogical reasoning and more structured instructional decision-making in the tutor's thinking process.
Problem

Research questions and friction points this paper is trying to address.

pedagogical reasoning
thinking process
large language models
intelligent tutoring systems
educational alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pedagogical Reasoning Prompting
Thinking Reward
Reinforcement Learning for LLMs
Educational Alignment
Reasoning Traces
๐Ÿ”Ž Similar Papers
No similar papers found.
U
Unggi Lee
Chosun University
Jiyeong Bae
Jiyeong Bae
Korea University
Machine Learning
J
Jaehyeon Park
Seoul National University
H
Haeun Park
Korea Institute for Curriculum and Evaluation
T
Taejun Park
Seoul National University
Y
Younghoon Jeon
Upstage
Sungmin Cho
Sungmin Cho
Delvine Inc.
Junbo Koh
Junbo Koh
Educational Technology, Seoul National University
ISDAIEDLearning SciencesLLM(LMM)
Yeil Jeong
Yeil Jeong
Indiana University
AI in EducationHuman-AI InteractionDomain-specific LLMs
G
Gyeonggeon Lee
Nanyang Technological University