Information-Preserving Reformulation of Reasoning Traces for Antidistillation

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the vulnerability of large language model (LLM) reasoning traces to knowledge distillation attacks—thereby threatening intellectual property—this paper proposes PART, a defense method leveraging the cognitive disparity between humans and models in interpreting chain-of-thought (CoT) reasoning. PART employs a two-step trace reconstruction strategy: (1) removing self-dialogue segments and (2) reordering intermediate conclusions—preserving user interpretability while disrupting distillation signals. A lightweight auxiliary model automates this process by jointly modeling behavioral patterns and sequential transformations, ensuring low computational overhead. Experiments show that PART reduces distillation accuracy of a 32B student model on AIME 2024 by 13.5%, with robust protection across diverse model architectures and scales. Its core innovation lies in the first formalization of human–model understanding divergence as a principled defense mechanism, achieving a synergistic balance between reasoning transparency and IP protection.

Technology Category

Application Category

📝 Abstract

Recent advances in Large Language Models (LLMs) show that extending the length of reasoning chains significantly improves performance on complex tasks. While revealing these reasoning traces helps users better follow, verify, and learn from the model's problem-solving process, it also makes them highly vulnerable to unauthorized distillation. To mitigate this risk, proprietary model providers often adopt aggressive protection strategies, such as replacing detailed reasoning with brief summaries, which deprive users of valuable intermediate information. To address this trade-off, we propose PART, an information-preserving antidistillation reformulation of reasoning traces. Motivated by the difference between how humans understand reasoning traces and how LLMs exploit them for supervised fine-tuning, we design a simple but effective two-step reformulation: removing self-talk behaviors and reordering sub-conclusions. A small auxiliary model is trained to perform this reformulation, incurring minimal computational overhead. Extensive experiments demonstrate that PART consistently disrupts distillation across student models of different sizes and types on various reasoning benchmarks. For instance, when training on reformulated traces, even the performance of a large 32B student model decreases from 54.17 to 46.88 on AIME 2024, corresponding to a 13.5% degradation.

Problem

Research questions and friction points this paper is trying to address.

Preserving reasoning trace information while preventing unauthorized model distillation

Reformulating reasoning traces to disrupt distillation without losing human value

Protecting proprietary models from distillation while maintaining interpretable reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Removes self-talk behaviors from reasoning traces

Reorders sub-conclusions to preserve information

Uses small auxiliary model for reformulation

🔎 Similar Papers

No similar papers found.

Authors to Follow