Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs

📅 2025-07-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from inefficient inference—fixed computational budgets poorly align with varying task complexity, causing over-computation on simple tasks and under-computation on complex ones. Method: This paper presents a systematic survey of adaptive and controllable test-time computation (TTC) strategies, proposing a two-level taxonomy: L1 (controlled inference under fixed budget) and L2 (dynamic resource allocation). It innovatively integrates dynamic scaling, confidence-guided early exiting, and hybrid inference modes to jointly optimize token efficiency and performance. Contribution/Results: Empirical evaluation across multiple benchmarks on mainstream closed-source LLMs establishes the first quantitative characterization of the trade-off between inference efficacy and computational cost. The work delivers both a theoretical framework and an empirical benchmark for efficient, user-constrained, and resource-adaptive LLM inference—emphasizing practicality, scalability, and responsiveness to user-specified constraints.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have rapidly progressed into general-purpose agents capable of solving a broad spectrum of tasks. However, current models remain inefficient at reasoning: they apply fixed inference-time compute regardless of task complexity, often overthinking simple problems while underthinking hard ones. This survey presents a comprehensive review of efficient test-time compute (TTC) strategies, which aim to improve the computational efficiency of LLM reasoning. We introduce a two-tiered taxonomy that distinguishes between L1-controllability, methods that operate under fixed compute budgets, and L2-adaptiveness, methods that dynamically scale inference based on input difficulty or model confidence. We benchmark leading proprietary LLMs across diverse datasets, highlighting critical trade-offs between reasoning performance and token usage. Compared to prior surveys on efficient reasoning, our review emphasizes the practical control, adaptability, and scalability of TTC methods. Finally, we discuss emerging trends such as hybrid thinking models and identify key challenges for future work towards making LLMs more computationally efficient, robust, and responsive to user constraints.
Problem

Research questions and friction points this paper is trying to address.

Optimizing LLM reasoning efficiency under fixed compute budgets
Dynamically scaling inference based on input difficulty or confidence
Balancing reasoning performance and token usage in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-tiered taxonomy for test-time compute strategies
Dynamic scaling of inference based on input difficulty
Benchmarking trade-offs between performance and token usage
🔎 Similar Papers
No similar papers found.
Mohammad Ali Alomrani
Mohammad Ali Alomrani
University of Toronto
Machine Learning
Y
Yingxue Zhang
Huawei Noah’s Ark Lab
D
Derek Li
Huawei Noah’s Ark Lab
Q
Qianyi Sun
Huawei Noah’s Ark Lab
Soumyasundar Pal
Soumyasundar Pal
Huawei Noah's Arc Lab Canada, McGill University
Monte Carlo methodsBayesian inferencegraph neural networkstime-series forecasting
Z
Zhanguang Zhang
Huawei Noah’s Ark Lab
Yaochen Hu
Yaochen Hu
Huawei Technologies Canada, University of Alberta
Large scale machine learningOptimizationRecommender systemsApproximation algorithmsStatistical machine learning
R
Rohan Deepak Ajwani
Huawei Noah’s Ark Lab
Antonios Valkanas
Antonios Valkanas
McGill University & Mila
Machine Learning
R
Raika Karimi
Huawei Noah’s Ark Lab
P
Peng Cheng
Huawei Noah’s Ark Lab
Y
Yunzhou Wang
Huawei Noah’s Ark Lab
P
Pengyi Liao
Huawei Noah’s Ark Lab
H
Hanrui Huang
Huawei Noah’s Ark Lab
B
Bin Wang
Huawei Noah’s Ark Lab
Jianye Hao
Jianye Hao
Huawei Noah's Ark Lab/Tianjin University
Multiagent SystemsEmbodied AI
Mark Coates
Mark Coates
Professor of Electrical Engineering, McGill University
Signal ProcessingComputer Networks