Pedagogy-R1: Pedagogically-Aligned Reasoning Model with Balanced Educational Benchmark

πŸ“… 2025-05-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large reasoning models (LRMs) exhibit weak pedagogical coherence, deficient knowledge-transmission logic, and inadequate simulation of teacher behaviors in educational settings. Method: This paper proposes a teaching-aligned distillation fine-tuning paradigm: (1) constructing WBEBβ€”the first multi-dimensional benchmark for evaluating educational capabilities; (2) designing Chain-of-Pedagogy (CoP), a structured prompting strategy that emulates instructional reasoning; and (3) integrating model distillation with instruction tuning to explicitly model pedagogical behaviors. The approach combines quantitative evaluation and qualitative analysis across five core educational tasks. Results: Experiments demonstrate significant improvements in teaching consistency, decision traceability, and reasoning plausibility. For the first time, this work systematically characterizes the strengths and critical limitations of LRMs in pedagogical competence, establishing both theoretical foundations and actionable technical pathways for trustworthy adaptation of large models to education.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in large reasoning models (LRMs) show strong performance in structured domains such as mathematics and programming; however, they often lack pedagogical coherence and realistic teaching behaviors. To bridge this gap, we introduce Pedagogy-R1, a framework that adapts LRMs for classroom use through three innovations: (1) a distillation-based pipeline that filters and refines model outputs for instruction-tuning, (2) the Well-balanced Educational Benchmark (WBEB), which evaluates performance across subject knowledge, pedagogical knowledge, tracing, essay scoring, and teacher decision-making, and (3) a Chain-of-Pedagogy (CoP) prompting strategy for generating and eliciting teacher-style reasoning. Our mixed-method evaluation combines quantitative metrics with qualitative analysis, providing the first systematic assessment of LRMs' pedagogical strengths and limitations.
Problem

Research questions and friction points this paper is trying to address.

LRMs lack pedagogical coherence in teaching
Need balanced evaluation across educational metrics
Require teacher-style reasoning generation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distillation-based pipeline for instruction-tuning
Well-balanced Educational Benchmark for evaluation
Chain-of-Pedagogy prompting for teacher-style reasoning
πŸ”Ž Similar Papers
No similar papers found.
U
Unggi Lee
Enuma, Inc.
J
Jaeyong Lee
Seoul National University
Jiyeong Bae
Jiyeong Bae
Korea University
Machine Learning
Yeil Jeong
Yeil Jeong
Indiana University
AI in EducationHuman-AI InteractionDomain-specific LLMs
Junbo Koh
Junbo Koh
Educational Technology, Seoul National University
ISDAIEDLearning SciencesLLM(LMM)
G
Gyeonggeon Lee
Nanyang Technological University
G
Gunho Lee
Enuma, Inc.
T
Taekyung Ahn
Enuma, Inc.
H
Hyeoncheol Kim
Korea University