Best-of-$infty$ -- Asymptotic Performance of Test-Time Compute

๐Ÿ“… 2025-09-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
While Best-of-N majority voting in large language models (LLMs) approaches optimal performance as N โ†’ โˆž (Best-of-โˆž), this limit is computationally infeasible. Method: We propose an efficient, budget-constrained approximation framework: (1) formalize a theoretical Best-of-โˆž model to characterize its asymptotic performance bound; (2) design an answer-consistencyโ€“driven adaptive mechanism to dynamically select N per query under fixed computational budget; (3) extend to weighted multi-model ensembling, jointly optimizing model weights and per-model sampling sizes via mixed-integer linear programming (MILP). Results: Our method closely approximates Best-of-โˆž performance within finite test budgets, improving inference efficiency by up to 2.3ร—. The weighted ensemble achieves consistent gains over the best individual model across multiple benchmarks, with average accuracy improvements of +4.7%. This work establishes the first theoretically grounded, scalable, and optimization-aware adaptive inference paradigm for LLMs.

Technology Category

Application Category

๐Ÿ“ Abstract
We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N o infty$, which we denote as Best-of-$infty$. While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. Beyond adaptivity, we extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model. The optimal ensemble weighting is formulated and efficiently computed as a mixed-integer linear program. Extensive experiments demonstrate the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Analyzing majority voting performance for large language models with infinite test-time computation
Proposing adaptive generation schemes to efficiently allocate inference-time computational resources
Extending framework to weighted ensembles that outperform individual models via optimal weighting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive generation based on answer agreement
Weighted ensembles of multiple LLMs
Optimal weighting via mixed-integer linear program
๐Ÿ”Ž Similar Papers
No similar papers found.