Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Long reasoning chains in large language models (LLMs) suffer from error propagation, high computational overhead, and inefficient trial-and-error. Method: This paper proposes a metacognition-inspired dynamic reasoning regulation framework that (1) decouples high-level metacognitive policy control from low-level text generation to enable real-time state modeling and progress assessment; (2) introduces a contextualized multi-armed bandit mechanism for online optimization of reasoning path selection and computational resource allocation; and (3) supports adaptive operations—including backtracking, clarification, restart, and path switching. Results: Evaluated on mathematical reasoning and logic puzzle benchmarks, the framework significantly improves accuracy and reasoning efficiency while reducing error propagation rates and average token consumption. It demonstrates strong scalability of dynamic reasoning chains and cross-task generalization capability.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) increasingly rely on prolonged reasoning chains to solve complex tasks. However, this trial-and-error approach often leads to high computational overhead and error propagation, where early mistakes can derail subsequent steps. To address these issues, we introduce Meta-Reasoner, a framework that dynamically optimizes inference-time reasoning by enabling LLMs to"think about how to think."Drawing inspiration from human meta-cognition and dual-process theory, Meta-Reasoner operates as a strategic advisor, decoupling high-level guidance from step-by-step generation. It employs"contextual multi-armed bandits"to iteratively evaluate reasoning progress, and select optimal strategies (e.g., backtrack, clarify ambiguity, restart from scratch, or propose alternative approaches), and reallocates computational resources toward the most promising paths. Our evaluations on mathematical reasoning and puzzles highlight the potential of dynamic reasoning chains to overcome inherent challenges in the LLM reasoning process and also show promise in broader applications, offering a scalable and adaptable solution for reasoning-intensive tasks.

Problem

Research questions and friction points this paper is trying to address.

Optimize inference-time reasoning in LLMs

Reduce computational overhead and error

Enhance reasoning strategies dynamically

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic reasoning optimization

Contextual multi-armed bandits

Decoupled strategic guidance

🔎 Similar Papers

No similar papers found.

Authors to Follow