Learning from Diverse Reasoning Paths with Routing and Collaboration

πŸ“… 2025-08-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of transferring large language models’ (LLMs’) reasoning capabilities to compact student models under resource constraints. We identify two key limitations in existing approaches: (1) conventional token-level knowledge distillation fails to capture holistic teacher reasoning, and (2) multi-path distillation overlooks inherent heterogeneity in reasoning path quality. To overcome these, we propose a three-stage multi-path knowledge distillation framework: (1) a large-model-based automatic evaluation mechanism for quality-aware filtering of high-accuracy reasoning paths; (2) a conditional routing strategy that dynamically matches paths to student model states; and (3) a student-to-student collaborative co-teaching mechanism to mitigate bias and fill knowledge gaps. Our method achieves significant improvements over both single-path and state-of-the-art multi-path distillation methods across multiple reasoning benchmarks. Ablation studies confirm the essential contribution of each component.

Technology Category

Application Category

πŸ“ Abstract
Advances in large language models (LLMs) significantly enhance reasoning capabilities but their deployment is restricted in resource-constrained scenarios. Knowledge distillation addresses this by transferring knowledge from powerful teacher models to compact and transparent students. However, effectively capturing the teacher's comprehensive reasoning is challenging due to conventional token-level supervision's limited scope. Using multiple reasoning paths per query alleviates this problem, but treating each path identically is suboptimal as paths vary widely in quality and suitability across tasks and models. We propose Quality-filtered Routing with Cooperative Distillation (QR-Distill), combining path quality filtering, conditional routing, and cooperative peer teaching. First, quality filtering retains only correct reasoning paths scored by an LLM-based evaluation. Second, conditional routing dynamically assigns paths tailored to each student's current learning state. Finally, cooperative peer teaching enables students to mutually distill diverse insights, addressing knowledge gaps and biases toward specific reasoning styles. Experiments demonstrate QR-Distill's superiority over traditional single- and multi-path distillation methods. Ablation studies further highlight the importance of each component including quality filtering, conditional routing, and peer teaching in effective knowledge transfer. Our code is available at https://github.com/LzyFischer/Distill.
Problem

Research questions and friction points this paper is trying to address.

Distilling diverse reasoning paths from teacher to student models
Filtering and routing reasoning paths by quality and suitability
Enabling mutual knowledge transfer among student models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quality-filtered routing with cooperative distillation
Conditional routing tailored to student learning
Cooperative peer teaching for diverse insights
πŸ”Ž Similar Papers
No similar papers found.