Long-Chain Reasoning Distillation via Adaptive Prefix Alignment

πŸ“… 2026-01-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses a key limitation in existing chain-of-thought distillation methods, where lengthy reasoning trajectories generated by teacher models often contain redundant or uncertain components that hinder effective learning by smaller student models. To overcome this challenge, the authors propose P-ALIGN, a novel framework that introduces an adaptive prefix alignment mechanism. This approach dynamically truncates the teacher’s reasoning trajectory, retaining only a concise suffix that effectively guides the student, while leveraging the corresponding prefix as a high-quality supervision signal for training. Evaluated across multiple mathematical reasoning benchmarks, P-ALIGN consistently outperforms current distillation techniques, achieving an average performance gain of over 3%. These results demonstrate the effectiveness and robustness of the proposed prefix alignment strategy in enhancing the long-chain reasoning capabilities of compact student models.

Technology Category

Application Category

πŸ“ Abstract
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in solving complex mathematical problems. Recent studies show that distilling long reasoning trajectories can effectively enhance the reasoning performance of small-scale student models. However, teacher-generated reasoning trajectories are often excessively long and structurally complex, making them difficult for student models to learn. This mismatch leads to a gap between the provided supervision signal and the learning capacity of the student model. To address this challenge, we propose Prefix-ALIGNment distillation (P-ALIGN), a framework that fully exploits teacher CoTs for distillation through adaptive prefix alignment. Specifically, P-ALIGN adaptively truncates teacher-generated reasoning trajectories by determining whether the remaining suffix is concise and sufficient to guide the student model. Then, P-ALIGN leverages the teacher-generated prefix to supervise the student model, encouraging effective prefix alignment. Experiments on multiple mathematical reasoning benchmarks demonstrate that P-ALIGN outperforms all baselines by over 3%. Further analysis indicates that the prefixes constructed by P-ALIGN provide more effective supervision signals, while avoiding the negative impact of redundant and uncertain reasoning components. All code is available at https://github.com/NEUIR/P-ALIGN.
Problem

Research questions and friction points this paper is trying to address.

reasoning distillation
long-chain reasoning
student-teacher mismatch
complex reasoning trajectories
supervision signal
Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning distillation
adaptive prefix alignment
chain-of-thought
large language models
knowledge distillation
πŸ”Ž Similar Papers
No similar papers found.