Validity-Calibrated Reasoning Distillation

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the limitations of traditional reasoning distillation methods, which rely on static teacher-student architectures and fixed trajectory imitation, thereby struggling with local ambiguities in intermediate reasoning steps. The authors reformulate reasoning distillation as a problem of local learning signal allocation and introduce a dynamic supervision mechanism grounded in local validity. By adaptively modulating distillation strength based on the relative effectiveness of the next-step actions selected by the student and teacher under identical prefixes, the approach enables context-aware parameter updates without requiring strict imitation of fixed reasoning paths. This method effectively transfers the multi-step reasoning capabilities of large language models and achieves significant performance gains over strong baselines across diverse tasks, including mathematical reasoning, code generation, and instruction following.

📝 Abstract

Reasoning distillation aims to transfer multi-step reasoning capabilities from large language models to smaller, more efficient ones. While recent methods have shown promising gains, they typically rely on static teacher-student hierarchies and frame distillation as trajectory imitation. This is misaligned with the structure of reasoning, where intermediate steps are often locally under-specified: global correctness constrains the final answer, but does not uniquely determine each intermediate move. We propose validity-calibrated reasoning distillation, a framework that treats reasoning distillation as a problem of local learning-signal allocation rather than path alignment. Instead of enforcing token-level imitation, we compare the student's and teacher's proposed next-step actions under the same prefix and use their relative local validity to modulate the strength of the distillation update. This yields a dynamic, context-dependent supervision mechanism that preserves the teacher's structural guidance while adapting update strength to local reasoning quality. Across mathematical reasoning, code generation, and instruction-following benchmarks, our method consistently outperforms strong distillation baselines. These results indicate that effective LLM reasoning distillation is governed not by rigid trajectory imitation, but by principled, locally calibrated allocation of learning signal.

Problem

Research questions and friction points this paper is trying to address.

reasoning distillation

trajectory imitation

local validity

learning signal allocation

intermediate steps

Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning distillation

validity calibration

local learning signal