Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the high computational cost of multi-path sampling in large language models (LLMs) for mathematical and reasoning tasks by proposing an adaptive stopping strategy grounded in Bayesian priors. The method tracks only the counts of the top $L-1$ most frequent answers—termed the “L-aggregated” strategy—and terminates sampling early once sufficient answer consensus is achieved. Theoretically, it is shown that $L=3$ suffices to attain asymptotic optimality, substantially outperforming baseline approaches without prior knowledge. Experimental results demonstrate that the proposed strategy reduces LLM invocation counts by up to 50% while maintaining comparable accuracy, thereby significantly lowering inference costs.

Technology Category

Application Category

📝 Abstract

A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficient"L-aggregated"stopping policy that tracks only the L-1 most frequent answer counts. Theoretically, we prove that L=3 is all you need: this coarse approximation is sufficient to achieve asymptotic optimality, and strictly dominates prior-free baselines, while having a fast posterior computation. Empirically, this identifies the most consistent (i.e., mode) LLM answer using fewer samples, and can achieve similar answer accuracy while cutting the number of LLM calls (i.e., saving on LLM inference costs) by up to 50%.

Problem

Research questions and friction points this paper is trying to address.

LLM consistency

sampling efficiency

inference cost

Bayesian stopping

answer accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian stopping

LLM consistency

efficient inference