Adaptive Stopping for Multi-Turn LLM Reasoning

πŸ“… 2026-04-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current large language models (LLMs) lack a theoretically grounded adaptive stopping mechanism for multi-round reasoning, often relying on heuristic rules or fixed iteration counts, which struggles to balance accuracy, computational cost, and latency. This work proposes MiCP, a novel framework that introduces conformal prediction into multi-round LLM inference for the first time. By dynamically allocating error budgets across reasoning rounds, MiCP enables early stopping while guaranteeing overall coverage validity. We further introduce new evaluation metrics that jointly assess coverage and efficiency, and demonstrate through experiments on both single-hop and multi-hop question answering benchmarks that MiCP significantly reduces the number of inference rounds, computational overhead, and prediction set sizeβ€”all while achieving the desired coverage guarantees.
πŸ“ Abstract
Large Language Models (LLMs) increasingly rely on multi-turn reasoning and interaction, such as adaptive retrieval-augmented generation (RAG) and ReAct-style agents, to answer difficult questions. These methods improve accuracy by iteratively retrieving information, reasoning, or acting, but introduce a key challenge: \textbf{When should the model stop?} Existing approaches rely on heuristic stopping rules or fixed turn budgets and provide no formal guarantees that the final prediction still contains the correct answer. This limitation is particularly problematic in high-stakes domains such as finance and healthcare, where unnecessary turns increase cost and latency, while stopping too early risks incorrect decisions. Conformal prediction (CP) provides formal coverage guarantees, but existing LLM-CP methods only apply to a single model output and cannot handle multi-turn pipelines with adaptive stopping. To address this gap, we propose Multi-Turn Language Models with Conformal Prediction (MiCP), the first CP framework for multi-turn reasoning. MiCP allocates different error budgets across turns, enabling the model to stop early while maintaining an overall coverage guarantee. We demonstrate MiCP on adaptive RAG and ReAct, where it achieves the target coverage on both single-hop and multi-hop question answering benchmarks while reducing the number of turns, inference cost, and prediction set size. We further introduce a new metric that jointly evaluates coverage validity and answering efficiency.
Problem

Research questions and friction points this paper is trying to address.

adaptive stopping
multi-turn reasoning
conformal prediction
LLM
coverage guarantee
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal Prediction
Multi-turn Reasoning
Adaptive Stopping
Retrieval-Augmented Generation
ReAct
X
Xiaofan Zhou
Department of Computer Science, University of Illinois Chicago
H
Huy Nguyen
Data science, Augustana College
B
Bo Yu
Department of Civil and Environmental Engineering, University of Utah
C
Chenxi Liu
Department of Civil and Environmental Engineering, University of Utah
Lu Cheng
Lu Cheng
Assistant Professor, UIC CS
Socially Responsible AICausal Machine LearningData MiningAI for Good