Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
Existing long-chain-of-thought (Long-CoT) distillation approaches rely on post-hoc filtering of complete reasoning trajectories, overlooking dynamic collaboration among heterogeneous teacher models and often resulting in sampling redundancy and missed complementary reasoning paths. This work proposes CoRD, a novel framework that, for the first time, introduces a collaborative decoding mechanism at the step level among heterogeneous teachers. By integrating predictive perplexity scoring with beam search, CoRD dynamically guides multiple teachers to jointly generate coherent yet diverse high-quality reasoning trajectories. This approach efficiently constructs structured supervision signals, preserving high-potential hypotheses while reducing data redundancy. Consequently, the student model achieves performance close to that of the teachers with significantly fewer training samples and demonstrates strong generalization capabilities on out-of-domain and open-ended tasks.
📝 Abstract
Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at \href{https://github.com/DISL-Lab/CoRD}{https://github.com/DISL-Lab/CoRD}.
Problem

Research questions and friction points this paper is trying to address.

Long-CoT reasoning
distillation
multi-teacher collaboration
reasoning trace curation
heterogeneous teachers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Long-CoT reasoning
multi-teacher distillation
step-wise decoding
collaborative reasoning synthesis
perplexity-based scoring
🔎 Similar Papers
No similar papers found.