Rational Metareasoning for Large Language Models

📅 2024-10-07
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from high inference costs and redundant reasoning steps, limiting their efficiency and scalability. Method: This paper proposes a rational reasoning control framework grounded in cognitive science–inspired meta-reasoning. Its core innovation is the first integration of “value of computation” (VoC) modeling into LLM inference control, implemented via a value-aware reward function, expert-iterated reinforcement training, and a dynamic reasoning-path decision mechanism that enables the model to autonomously determine whether to generate intermediate reasoning steps. Contribution/Results: Evaluated on three mainstream LLMs, the method reduces inference tokens by 20–37% while matching the accuracy of few-shot chain-of-thought and STaR baselines. It achieves Pareto-optimal trade-offs between inference cost and task performance, significantly improving inference efficiency without sacrificing accuracy.

Technology Category

Application Category

📝 Abstract
Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs), deploying additional inference-time compute to improve task performance. However, as LLMs increase in both size and adoption, inference costs are correspondingly becoming increasingly burdensome. How, then, might we optimize reasoning's cost-performance tradeoff? This work introduces a novel approach based on computational models of metareasoning used in cognitive science, training LLMs to selectively use intermediate reasoning steps only when necessary. We first develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning, then use this reward function with Expert Iteration to train the LLM. Compared to few-shot chain-of-thought prompting and STaR, our method significantly reduces inference costs (20-37% fewer tokens generated across three models) while maintaining task performance across diverse datasets.
Problem

Research questions and friction points this paper is trying to address.

Optimize cost-performance tradeoff in LLM reasoning
Reduce inference costs by minimizing unnecessary reasoning steps
Maintain task performance while lowering token generation by 20-37%
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective intermediate reasoning steps usage
Reward function penalizing unnecessary reasoning
Expert Iteration for cost-performance optimization