🤖 AI Summary
Current large language models (LLMs) employ fixed-depth inference strategies, failing to dynamically adjust reasoning depth according to task difficulty and uncertainty—leading to resource redundancy on simple tasks and insufficient reasoning on complex ones. To address this, we propose an adaptive inference paradigm that frames inference resource allocation as a control-augmented policy optimization problem. We establish a systematic taxonomy encompassing both training-internalized and training-agnostic approaches, and formally characterize the implementation mechanisms of deduction, induction, and abduction in LLMs. Leveraging techniques including reinforcement learning, learned controllers, prompt-conditioned control, feedback-driven termination, and modular composition, our framework enables dynamic, fine-grained regulation of the reasoning process. We introduce a unified evaluation framework for method comparison and identify key challenges: self-assessment, meta-reasoning, and human-aligned control. This work provides both theoretical foundations and practical pathways toward efficient, controllable, and interpretable LLM inference.
📝 Abstract
Recent advances in large language models (LLMs) have made reasoning a central benchmark for evaluating intelligence. While prior surveys focus on efficiency by examining how to shorten reasoning chains or reduce computation, this view overlooks a fundamental challenge: current LLMs apply uniform reasoning strategies regardless of task complexity, generating long traces for trivial problems while failing to extend reasoning for difficult tasks. This survey reframes reasoning through the lens of {adaptivity}: the capability to allocate reasoning effort based on input characteristics such as difficulty and uncertainty. We make three contributions. First, we formalize deductive, inductive, and abductive reasoning within the LLM context, connecting these classical cognitive paradigms with their algorithmic realizations. Second, we formalize adaptive reasoning as a control-augmented policy optimization problem balancing task performance with computational cost, distinguishing learned policies from inference-time control mechanisms. Third, we propose a systematic taxonomy organizing existing methods into training-based approaches that internalize adaptivity through reinforcement learning, supervised fine-tuning, and learned controllers, and training-free approaches that achieve adaptivity through prompt conditioning, feedback-driven halting, and modular composition. This framework clarifies how different mechanisms realize adaptive reasoning in practice and enables systematic comparison across diverse strategies. We conclude by identifying open challenges in self-evaluation, meta-reasoning, and human-aligned reasoning control.