Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pruning large reasoning models (LRMs) often leads to significant degradation in reasoning capability, and directly adapting pruning techniques from large language models (LLMs) yields suboptimal results due to fundamental architectural and behavioral differences. Method: This paper introduces Selective Self-Generated Rectification (SSGR), the first systematic pruning framework tailored for LRMs. SSGR leverages the model’s own generated reasoning paths as calibration data and employs a dual-criteria filtering mechanism—based on reasoning difficulty and path length—to optimize activation distribution matching during calibration. Contribution/Results: By resolving the critical trade-off between calibration data quality and pruning robustness, SSGR achieves 10–13% gains in post-pruning reasoning performance on the DeepSeek-R1-Distill series, substantially outperforming conventional pruning methods. The approach provides a scalable, self-supervised pathway for efficient LRM deployment without reliance on external or manually annotated data.

Technology Category

Application Category

📝 Abstract
Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex reasoning benchmarks. However, their long chain-of-thought reasoning processes incur significant inference overhead. Pruning has emerged as a promising approach to reducing computational costs. However, existing efforts have primarily focused on large language models (LLMs), while pruning LRMs remains unexplored. In this work, we conduct the first empirical study on pruning LRMs and show that directly applying existing pruning techniques fails to yield satisfactory results. Our findings indicate that using self-generated reasoning data for calibration can substantially improve pruning performance. We further investigate how the difficulty and length of reasoning data affect pruning outcomes. Our analysis reveals that challenging and moderately long self-generated reasoning data serve as ideal calibration data. Based on these insights, we propose a Selective Self-Generated Reasoning (SSGR) data construction strategy to provide effective calibration data for pruning LRMs. Experimental results on the DeepSeek-R1-Distill model series validate that our strategy improves the reasoning ability of pruned LRMs by 10%-13% compared to general pruning methods.
Problem

Research questions and friction points this paper is trying to address.

Pruning Large Reasoning Models reduces computational costs
Existing pruning techniques fail on reasoning model architectures
Selective self-generated calibration data improves pruned model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective self-generated reasoning data for calibration
Moderately long challenging data improves pruning outcomes
Strategy enhances reasoning ability of pruned models
🔎 Similar Papers
No similar papers found.