TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the efficiency bottleneck posed by lengthy output generation in large language models (LLMs) during complex reasoning, this paper proposes a dynamic ratio reweighting training paradigm grounded in dual-process cognitive theory. The method adaptively prunes reasoning chains by online adjusting training weights between System-1 (intuitive) and System-2 (deliberative) data—requiring no human annotations, auxiliary models, or ensemble techniques. Its core innovations are (i) a learnable dynamic weight scheduling mechanism and (ii) a synergistic Chain-of-Thought distillation framework for joint optimization. Evaluated on the DeepSeek-R1-Distill model series, the approach achieves an average 39.7% reduction in output token count while preserving inference accuracy—yielding substantial gains in throughput and deployment feasibility for long-reasoning tasks.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently achieved remarkable progress by leveraging Reinforcement Learning and extended Chain-of-Thought (CoT) techniques. However, the challenge of performing efficient language reasoning--especially during inference with extremely long outputs--has drawn increasing attention from the research community. In this work, we propose a dynamic ratio-based training pipeline that does not rely on sophisticated data annotations or interpolation between multiple models. We continuously balance the weights between the model's System-1 and System-2 data to eliminate redundant reasoning processes while preserving the model's reasoning capability. We validate our approach across models on DeepSeek-R1-Distill-7B and DeepSeek-R1-Distill-14B and on a diverse set of benchmarks with varying difficulty levels. Our method significantly reduces the number of output tokens by nearly 40% while maintaining the accuracy of the reasoning. Our code and data will be available soon.

Problem

Research questions and friction points this paper is trying to address.

Efficient language reasoning with long outputs

Reducing redundant reasoning processes in LLMs

Maintaining accuracy while compressing output tokens

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic ratio-based training pipeline

Balancing System-1 and System-2 weights

Reduces output tokens by 40%

🔎 Similar Papers

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning