MarkovScale: Towards Optimal Sequential Scaling at Inference Time

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing sequential scaling methods predominantly rely on heuristic strategies, lacking theoretical guarantees that limit both performance and interpretability. This work pioneers a formal modeling of sequential scaling as a two-state Markov process, from which we derive sufficient conditions for accuracy improvement and establish provable upper and lower bounds on performance. Building upon this theoretical foundation, we develop a closed-form optimization solution that enables principle-driven inference scheduling. Evaluated across three prominent large language models, five benchmark datasets, and over twenty experimental configurations, our approach consistently outperforms existing parallel and sequential scaling strategies, achieving significant gains in both inference efficiency and accuracy.

Technology Category

Application Category

📝 Abstract

Sequential scaling is a prominent inference-time scaling paradigm, yet its performance improvements are typically modest and not well understood, largely due to the prevalence of heuristic, non-principled approaches that obscure clear optimality bounds. To address this, we propose a principled framework that models sequential scaling as a two-state Markov process. This approach reveals the underlying properties of sequential scaling and yields closed-form solutions for essential aspects, such as the specific conditions under which accuracy is improved and the theoretical upper, neutral, and lower performance bounds. Leveraging this formulation, we develop MarkovScale, a practical system that applies these optimality criteria to achieve a theoretically grounded balance between accuracy and efficiency. Comprehensive experiments across 3 backbone LLMs, 5 benchmarks, and over 20 configurations show that MarkovScale consistently outperforms state-of-the-art parallel and sequential scaling methods, representing a significant step toward optimal and resource-efficient inference in LLMs. The source code will be open upon acceptance at https://open-upon-acceptance.

Problem

Research questions and friction points this paper is trying to address.

sequential scaling

inference-time scaling

optimality bounds

large language models

Markov process

Innovation

Methods, ideas, or system contributions that make the work stand out.

Markov process

sequential scaling

optimal inference