Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria

📅 2024-12-30

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

To address excessive computational overhead in large language model (LLM) inference caused by redundant intermediate reasoning steps, this paper proposes a sentence-level rationale compression framework. Unlike prevailing token-level compression methods, our approach pioneers sentence-level granularity, jointly modeling sentence likelihood and redundancy to construct an interpretable, learnable verbosity scoring mechanism, integrated with supervised distillation-based pruning for efficient inference. Experiments across multiple models (e.g., LLaMA-2, Qwen) and reasoning-intensive benchmarks (e.g., GSM8K, HotpotQA) demonstrate an average 17.15% reduction in generated sequence length, substantial decreases in inference latency and GPU memory consumption, and zero accuracy degradation. Our key contributions are: (1) the first sentence-level rationale compression paradigm; (2) a likelihood-based, interpretable redundancy metric; and (3) an end-to-end compression framework that simultaneously preserves answer accuracy and enhances inference efficiency.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) rely on generating extensive intermediate reasoning units (e.g., tokens, sentences) to enhance final answer quality across a wide range of complex tasks. While generating multiple reasoning paths or iteratively refining rationales proves effective for improving performance, these approaches inevitably result in significantly higher inference costs. In this work, we propose a novel sentence-level rationale reduction training framework that leverages likelihood-based criteria, verbosity, to identify and remove redundant reasoning sentences. Unlike previous approaches that utilize token-level reduction, our sentence-level reduction framework maintains model performance while reducing generation length. This preserves the original reasoning abilities of LLMs and achieves an average 17.15% reduction in generation costs across various models and tasks.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Computation Cost

Time Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sentence-level Optimization

Cost Reduction

Adaptive Model Efficiency

🔎 Similar Papers

Studying Practitioners' Expectations on Clear Code Review Comments