DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large reasoning models (LRMs) rely on chain-of-thought (CoT) prompting to enhance complex reasoning, yet redundant inference steps incur substantial computational overhead and degrade efficiency. To address this, we propose Distillation-based Reasoning Pruning (DRP), a novel framework that jointly introduces skill-aware CoT decomposition and path distillation. Specifically, a teacher model decomposes CoT into skill-granular sub-steps and prunes redundancies, while a student model learns compact, efficient reasoning paths via structural alignment training. DRP integrates inference-time pruning, fine-tuning–based path distillation, skill-driven structural decomposition, and alignment-aware training. On GSM8K, DRP reduces token usage by 64% (from 917 to 328) while improving accuracy to 94.1%; on AIME, it achieves a 43% token reduction with zero accuracy loss. These results significantly surpass the performance limits of conventional pruning- or distillation-only approaches.

Technology Category

Application Category

📝 Abstract

While Large Reasoning Models (LRMs) have demonstrated success in complex reasoning tasks through long chain-of-thought (CoT) reasoning, their inference often involves excessively verbose reasoning traces, resulting in substantial inefficiency. To address this, we propose Distilled Reasoning Pruning (DRP), a hybrid framework that combines inference-time pruning with tuning-based distillation, two widely used strategies for efficient reasoning. DRP uses a teacher model to perform skill-aware step decomposition and content pruning, and then distills the pruned reasoning paths into a student model, enabling it to reason both efficiently and accurately. Across several challenging mathematical reasoning datasets, we find that models trained with DRP achieve substantial improvements in token efficiency without sacrificing accuracy. Specifically, DRP reduces average token usage on GSM8K from 917 to 328 while improving accuracy from 91.7% to 94.1%, and achieves a 43% token reduction on AIME with no performance drop. Further analysis shows that aligning the reasoning structure of training CoTs with the student's reasoning capacity is critical for effective knowledge transfer and performance gains.

Problem

Research questions and friction points this paper is trying to address.

Reduces verbose reasoning traces in Large Reasoning Models

Combines pruning and distillation for efficient reasoning

Improves token efficiency without sacrificing accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework combining pruning and distillation

Skill-aware step decomposition for efficient reasoning

Aligns reasoning structure with student capacity

🔎 Similar Papers

No similar papers found.

Authors to Follow