PACE: Prefix-Protected and Difficulty-Aware Compression for Efficient Reasoning

πŸ“… 2026-02-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the inefficiency of language reasoning models that, during inference, often generate excessively long reasoning traces due to β€œoverthinking,” leading to increased latency and memory overhead. Existing uniform length-penalization strategies tend to truncate crucial early reasoning steps and fail to account for varying problem difficulty. To overcome these limitations, we propose PACE, a novel framework that introduces prefix preservation at the sequence level to retain effective reasoning paths and incorporates difficulty-aware length penalization at the group level to dynamically modulate compression intensity. PACE is the first approach to jointly leverage prefix protection and difficulty awareness, enabling hierarchical supervision across two compression levels. Evaluated on the DeepSeek-R1-Distill-Qwen model, PACE reduces token usage by up to 55.7% on mathematical benchmarks while improving accuracy by as much as 4.1%, and demonstrates strong generalization across code, scientific, and general-domain tasks.

Technology Category

Application Category

πŸ“ Abstract
Language Reasoning Models (LRMs) achieve strong performance by scaling test-time computation but often suffer from ``overthinking'', producing excessively long reasoning traces that increase latency and memory usage. Existing LRMs typically enforce conciseness with uniform length penalties, which over-compress crucial early deduction steps at the sequence level and indiscriminately penalize all queries at the group level. To solve these limitations, we propose \textbf{\model}, a dual-level framework for prefix-protected and difficulty-aware compression under hierarchical supervision. At the sequence level, prefix-protected optimization employs decaying mixed rollouts to maintain valid reasoning paths while promoting conciseness. At the group level, difficulty-aware penalty dynamically scales length constraints based on query complexity, maintaining exploration for harder questions while curbing redundancy on easier ones. Extensive experiments on DeepSeek-R1-Distill-Qwen (1.5B/7B) demonstrate that \model achieves a substantial reduction in token usage (up to \textbf{55.7\%}) while simultaneously improving accuracy (up to \textbf{4.1\%}) on math benchmarks, with generalization ability to code, science, and general domains.
Problem

Research questions and friction points this paper is trying to address.

overthinking
reasoning compression
length penalty
query difficulty
reasoning efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

prefix-protected compression
difficulty-aware penalty
hierarchical supervision
reasoning efficiency
language reasoning models
πŸ”Ž Similar Papers
No similar papers found.
R
Ruixiang Feng
University of Electronic Science and Technology of China
Y
Yuntao Wen
University of Electronic Science and Technology of China
S
Silin Zhou
University of Electronic Science and Technology of China
K
Ke Shi
University of Electronic Science and Technology of China
Yifan Wang
Yifan Wang
Dalian University of Technology
Video & Image SegmentationImage Processing
R
Ran Le
Nanbeige Lab, BOSS Zhipin
Z
Zhenwei An
Nanbeige Lab, BOSS Zhipin
Z
Zongchao Chen
Nanbeige Lab, BOSS Zhipin
C
Chen Yang
Nanbeige Lab, BOSS Zhipin
Guangyue Peng
Guangyue Peng
Peking University
Y
Yiming Jia
Nanbeige Lab, BOSS Zhipin
Dongsheng Wang
Dongsheng Wang
University of Electronic Science and Technology of China
POIdata mining
T
Tao Zhang
Nanbeige Lab, BOSS Zhipin
L
Lisi Chen
University of Electronic Science and Technology of China
Y
Yang Song
Nanbeige Lab, BOSS Zhipin
Shen Gao
Shen Gao
University of Electronic Science and Technology of China
Natural Language Processing
Shuo Shang
Shuo Shang
Computer Science & AI Scientist
Spatial dataSpatiotemporal databases