EduFlow: Advancing MLLMs' Problem-Solving Proficiency through Multi-Stage, Multi-Perspective Critique

📅 2025-07-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) exhibit limited performance on multi-step, interpretable scientific reasoning tasks due to the absence of domain-specific scientific reasoning patterns, lack of global coherence in reasoning paths, and no reflective self-correction capability. To address these challenges, we propose EduFlow—the first end-to-end scientific reasoning framework tailored for educational settings. Our method introduces (1) EduPRM, a process-aware reward model enabling fine-grained assessment of reasoning quality, and (2) EduMCTS, a domain-adapted Monte Carlo Tree Search mechanism supporting multi-stage critical feedback and reflective self-correction. Leveraging MCTS-guided trajectory generation, curriculum learning, controlled error injection, teacher-student dialogue supervision, and self-consistency optimization, EduFlow significantly improves both reasoning accuracy and consistency. We release EduMCTS-160K, a large-scale dataset of scientific reasoning trajectories, alongside open-source code, data, and models.

Technology Category

Application Category

📝 Abstract
Multimodal large language models (MLLMs) still perform poorly on scientific tasks, particularly those requiring multi-step and interpretable reasoning. Their limitations include insufficient scientific reasoning patterns, lack of global coherence in multi-step inference, and the absence of reflective self-correction, making them unreliable in structured scientific contexts. We introduce EduFlow, the first end-to-end framework that covers the full pipeline of educational scientific reasoning, including data selection, MCTS-based trajectory construction, model training, and output optimization. At its core is EduPRM, a process-aware reward model that critiques reasoning steps with tags and justifications. EduPRM is trained via curriculum learning on three complementary supervision sources: MCTS-guided trajectories, error-injected critiques, and teacher-student dialogues, enabling dynamic adaptation to multi-stage problem solving and iterative refinement during inference. We further propose EduMCTS, a domain-adapted search framework that introduces bootstrapping actions specifically designed for educational reasoning, such as a self-reflection mechanism that promotes reflective error correction. It further leverages EduPRM's fine-grained feedback to guide the search toward higher-quality reasoning trajectories. By applying self-consistency and rejection sampling, we constructed EduMCTS-160K, a large-scale dataset of educational reasoning trajectories. Extensive experiments demonstrate that EduFlow enhances reasoning consistency and coherence. Code, data, and models will be released.
Problem

Research questions and friction points this paper is trying to address.

Improving MLLMs' scientific reasoning with multi-step critique
Enhancing global coherence in multi-step scientific inference
Introducing reflective self-correction for structured scientific tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

EduFlow: end-to-end educational reasoning framework
EduPRM: process-aware reward model with curriculum learning
EduMCTS: domain-adapted search with self-reflection mechanism
C
Chenglin Zhu
Baichuan Inc., Peking University
T
Tao Zhang
Baichuan Inc.
C
Chong Li
Baichuan Inc., Peking University
Mingan Lin
Mingan Lin
baichuan-inc
LLM、MLLM、AI
Z
Zenan Zhou
Baichuan Inc.
J
Jian Xie
Baichuan Inc.