Long-Chain Reasoning Distillation via Adaptive Prefix Alignment

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses a key limitation in existing chain-of-thought distillation methods, where lengthy reasoning trajectories generated by teacher models often contain redundant or uncertain components that hinder effective learning by smaller student models. To overcome this challenge, the authors propose P-ALIGN, a novel framework that introduces an adaptive prefix alignment mechanism. This approach dynamically truncates the teacher’s reasoning trajectory, retaining only a concise suffix that effectively guides the student, while leveraging the corresponding prefix as a high-quality supervision signal for training. Evaluated across multiple mathematical reasoning benchmarks, P-ALIGN consistently outperforms current distillation techniques, achieving an average performance gain of over 3%. These results demonstrate the effectiveness and robustness of the proposed prefix alignment strategy in enhancing the long-chain reasoning capabilities of compact student models.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in solving complex mathematical problems. Recent studies show that distilling long reasoning trajectories can effectively enhance the reasoning performance of small-scale student models. However, teacher-generated reasoning trajectories are often excessively long and structurally complex, making them difficult for student models to learn. This mismatch leads to a gap between the provided supervision signal and the learning capacity of the student model. To address this challenge, we propose Prefix-ALIGNment distillation (P-ALIGN), a framework that fully exploits teacher CoTs for distillation through adaptive prefix alignment. Specifically, P-ALIGN adaptively truncates teacher-generated reasoning trajectories by determining whether the remaining suffix is concise and sufficient to guide the student model. Then, P-ALIGN leverages the teacher-generated prefix to supervise the student model, encouraging effective prefix alignment. Experiments on multiple mathematical reasoning benchmarks demonstrate that P-ALIGN outperforms all baselines by over 3%. Further analysis indicates that the prefixes constructed by P-ALIGN provide more effective supervision signals, while avoiding the negative impact of redundant and uncertain reasoning components. All code is available at https://github.com/NEUIR/P-ALIGN.

Problem

Research questions and friction points this paper is trying to address.

reasoning distillation

long-chain reasoning

student-teacher mismatch

complex reasoning trajectories

supervision signal

Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning distillation

adaptive prefix alignment

chain-of-thought