SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling

πŸ“… 2025-11-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current test-time computation scaling for LLMs in mathematical reasoning employs uniform resource allocation across subproblems, leading to insufficient resources for hard subproblems, redundant computation on easy ones, and diminishing marginal returns. Method: We propose a dynamic, difficulty-aware resource allocation framework: (1) decompose problems into subtasks; (2) estimate each subtask’s difficulty; and (3) adaptively select between fast, intuitive (System 1) or slow, deliberative (System 2) processing modes based on difficulty, while maintaining reasoning coherence via contextual propagation. This enables fine-grained, sequential computational scheduling. Results: On AIME25, our method improves accuracy by 13.75 percentage points and reduces computational cost by 33%–53% over uniform scaling baselines. It is the first work to introduce cognitively inspired, dynamic resource allocation for test-time LLM reasoning optimization.

Technology Category

Application Category

πŸ“ Abstract
Test-time compute scaling has emerged as a powerful paradigm for enhancing mathematical reasoning in large language models (LLMs) by allocating additional computational resources during inference. However, current methods employ uniform resource distribution across all reasoning sub-problems, creating fundamental bottlenecks where challenging sub-problems receive insufficient attention while routine operations consume disproportionate resources. This uniform allocation creates performance bottlenecks where additional computational resources yield diminishing returns. Inspired by dual-process theory, we propose extbf{SCALE} (Selective Resource Allocation), a framework that selectively allocates computational resources based on sub-problem difficulty. SCALE operates through four stages: (1) problem decomposition into sequential reasoning sub-problems, (2) difficulty assessment of each sub-problem to distinguish between routine operations and computationally challenging sub-problems, (3) selective processing mode assignment between System 1 for simple sub-problems and System 2 for complex ones, and (4) sequential execution with context propagation. By concentrating resources on challenging sub-problems while processing routine operations efficiently, SCALE achieves substantial performance improvements with superior resource utilization. Extensive experiments demonstrate that SCALE significantly outperforms uniform scaling baselines, achieving accuracy improvements of up to 13.75 percentage points (57.50% to 71.25% on AIME25) while reducing computational costs by 33%-53%, representing a major advance in test-time scaling that addresses fundamental limitations of current approaches.
Problem

Research questions and friction points this paper is trying to address.

Selectively allocates computational resources to challenging sub-problems
Overcomes performance bottlenecks from uniform resource distribution in reasoning
Improves mathematical reasoning accuracy while reducing computational costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selectively allocates compute based on sub-problem difficulty
Uses dual-process theory to assign simple or complex processing
Improves accuracy while reducing computational costs significantly
πŸ”Ž Similar Papers
No similar papers found.
Y
Yang Xiao
The Hong Kong Polytechnic University
Chunpu Xu
Chunpu Xu
PolyU
Multimodal learningNatural language processing
Ruifeng Yuan
Ruifeng Yuan
Ph.D from the Hong Kong Polytechnic University
Nature language processing
J
Jiashuo Wang
The Hong Kong Polytechnic University
W
Wenjie Li
The Hong Kong Polytechnic University
P
Pengfei Liu
Shanghai Jiao Tong University, SII