Pedagogy-R1: Pedagogically-Aligned Reasoning Model with Balanced Educational Benchmark

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Large reasoning models (LRMs) exhibit weak pedagogical coherence, deficient knowledge-transmission logic, and inadequate simulation of teacher behaviors in educational settings. Method: This paper proposes a teaching-aligned distillation fine-tuning paradigm: (1) constructing WBEB—the first multi-dimensional benchmark for evaluating educational capabilities; (2) designing Chain-of-Pedagogy (CoP), a structured prompting strategy that emulates instructional reasoning; and (3) integrating model distillation with instruction tuning to explicitly model pedagogical behaviors. The approach combines quantitative evaluation and qualitative analysis across five core educational tasks. Results: Experiments demonstrate significant improvements in teaching consistency, decision traceability, and reasoning plausibility. For the first time, this work systematically characterizes the strengths and critical limitations of LRMs in pedagogical competence, establishing both theoretical foundations and actionable technical pathways for trustworthy adaptation of large models to education.

Technology Category

Application Category

📝 Abstract

Recent advances in large reasoning models (LRMs) show strong performance in structured domains such as mathematics and programming; however, they often lack pedagogical coherence and realistic teaching behaviors. To bridge this gap, we introduce Pedagogy-R1, a framework that adapts LRMs for classroom use through three innovations: (1) a distillation-based pipeline that filters and refines model outputs for instruction-tuning, (2) the Well-balanced Educational Benchmark (WBEB), which evaluates performance across subject knowledge, pedagogical knowledge, tracing, essay scoring, and teacher decision-making, and (3) a Chain-of-Pedagogy (CoP) prompting strategy for generating and eliciting teacher-style reasoning. Our mixed-method evaluation combines quantitative metrics with qualitative analysis, providing the first systematic assessment of LRMs' pedagogical strengths and limitations.

Problem

Research questions and friction points this paper is trying to address.

LRMs lack pedagogical coherence in teaching

Need balanced evaluation across educational metrics

Require teacher-style reasoning generation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distillation-based pipeline for instruction-tuning

Well-balanced Educational Benchmark for evaluation

Chain-of-Pedagogy prompting for teacher-style reasoning

🔎 Similar Papers

No similar papers found.