Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the challenge of effectively transferring chain-of-thought (CoT) reasoning from large teacher models to smaller, less capable student models without overwhelming them with overly complex or infeasible reasoning paths. To this end, the authors propose Gen-SSD, a novel framework that uniquely integrates the student model into the teacher’s generation process. By dynamically evaluating candidate continuations in real time, Gen-SSD selects learnable reasoning trajectories and prunes unproductive branches, embedding supervision directly into the generation phase rather than relying on post-hoc filtering. This approach synergistically combines CoT reasoning, knowledge distillation, and an in-generation self-selection mechanism, substantially enhancing both the learnability and stability of distilled reasoning paths. Experiments on mathematical reasoning benchmarks demonstrate that Gen-SSD outperforms standard knowledge distillation and other baselines by an average of 5.9 points.

Technology Category

Application Category

📝 Abstract

Large reasoning models achieve strong performance on complex tasks through long chain-of-thought (CoT) trajectories, but directly transferring such reasoning processes to smaller models remains challenging. A key difficulty is that not all teacher-generated reasoning trajectories are suitable for student learning. Existing approaches typically rely on post-hoc filtering, selecting trajectories after full generation based on heuristic criteria. However, such methods cannot control the generation process itself and may still produce reasoning paths that lie outside the student's learning capacity. To address this limitation, we propose Gen-SSD (Generation-time Self-Selection Distillation), a student-in-the-loop framework that performs generation-time selection. Instead of passively consuming complete trajectories, the student evaluates candidate continuations during the teacher's sampling process, guiding the expansion of only learnable reasoning paths and enabling early pruning of unhelpful branches. Experiments on mathematical reasoning benchmarks demonstrate that Gen-SSD consistently outperforms standard knowledge distillation and recent baselines, with improvements of around 5.9 points over Standard KD and up to 4.7 points over other baselines. Further analysis shows that Gen-SSD produces more stable and learnable reasoning trajectories, highlighting the importance of incorporating supervision during generation for effective distillation.

Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought Distillation

Knowledge Distillation

Reasoning Trajectories

Student-in-the-Loop

Generation-Time Selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought Distillation

Student-in-the-Loop

Generation-Time Selection