CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the challenge of precisely controlling error rates in multi-output sampling with large language models when the answer set is unknown and stopping rules are data-dependent. To this end, the paper proposes CITE, the first algorithm that enables error-rate-controllable certification without requiring prior knowledge of answer categories and while accommodating arbitrary data-driven stopping rules. Built upon E-processes and intersection–union hypothesis testing, CITE provides anytime-valid statistical verification that a target answer is the unique mode of the response distribution, achieving minimax-optimal stopping time rates. Empirical results demonstrate that CITE maintains strict error rate control while substantially improving certification efficiency under long-tailed output distributions.
📝 Abstract
Large language models often improve reasoning by sampling multiple outputs and aggregating their final answers, but precise and efficient control of error levels remains a challenging task. In particular, deciding when to stop sampling remains difficult when the stopping rule is data-dependent and the set of possible answers is not known in advance. We study anytime-valid certification of a prespecified target answer as the unique mode of the model's response distribution, a guarantee distinct from answer correctness. We propose the Certification by Intersection-union Testing with E-processes (CITE) algorithm, which provably controls false certification at any prescribed level under arbitrary data-driven stopping, without requiring prior knowledge of the answer category set. We also prove an category-set-size-free stopping-time rate, establish matching minimax lower bounds up to constants in the main regime, and extend the construction to confidence-weighted voting. Simulations and LLM self-consistency experiments show empirical error control and improved certification in diffuse-tail settings.
Problem

Research questions and friction points this paper is trying to address.

anytime-valid inference
LLM self-consistency
error control
data-dependent stopping
answer certification
Innovation

Methods, ideas, or system contributions that make the work stand out.

anytime-valid inference
self-consistency
E-processes
intersection-union testing
data-dependent stopping
🔎 Similar Papers
No similar papers found.