Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study investigates the generalization of reasoning distillation models—specifically, whether distilled student models genuinely inherit teachers’ reasoning behaviors in novel test settings or revert to their original, undistilled patterns. Method: We propose the first cross-model reasoning provenance framework, which attributes each output token to its source (teacher, original student, or distilled student) by quantifying predictive probability distribution discrepancies across these models under identical contexts. We introduce a novel taxonomy for reasoning distillation provenance and design a principle-driven data selection method based on teacher–student disagreement, replacing conventional heuristic strategies. Results: Experiments demonstrate that distilled models generate a substantial proportion of “teacher-originated actions,” and this proportion strongly correlates with downstream task performance. Our approach consistently improves distillation efficacy across diverse teacher–student model pairs, empirically validating both the transferability and attributability of reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Reasoning distillation has attracted increasing attention. It typically leverages a large teacher model to generate reasoning paths, which are then used to fine-tune a student model so that it mimics the teacher's behavior in training contexts. However, previous approaches have lacked a detailed analysis of the origins of the distilled model's capabilities. It remains unclear whether the student can maintain consistent behaviors with the teacher in novel test-time contexts, or whether it regresses to its original output patterns, raising concerns about the generalization of distillation models. To analyse this question, we introduce a cross-model Reasoning Distillation Provenance Tracing framework. For each action (e.g., a sentence) produced by the distilled model, we obtain the predictive probabilities assigned by the teacher, the original student, and the distilled model under the same context. By comparing these probabilities, we classify each action into different categories. By systematically disentangling the provenance of each action, we experimentally demonstrate that, in test-time contexts, the distilled model can indeed generate teacher-originated actions, which correlate with and plausibly explain observed performance on distilled model. Building on this analysis, we further propose a teacher-guided data selection method. Unlike prior approach that rely on heuristics, our method directly compares teacher-student divergences on the training data, providing a principled selection criterion. We validate the effectiveness of our approach across multiple representative teacher models and diverse student models. The results highlight the utility of our provenance-tracing framework and underscore its promise for reasoning distillation. We hope to share Reasoning Distillation Provenance Tracing and our insights into reasoning distillation with the community.

Problem

Research questions and friction points this paper is trying to address.

Traces origins of distilled model's actions in reasoning distillation.

Analyzes generalization of student models to novel test contexts.

Proposes teacher-guided data selection for principled distillation training.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-model provenance tracing framework for distillation

Teacher-guided data selection via divergence comparison

Categorizing actions by comparing predictive probabilities

🔎 Similar Papers

Whose LLM is it Anyway? Linguistic Comparison and LLM Attribution for GPT-3.5, GPT-4 and Bard