🤖 AI Summary
This work addresses the challenge of dynamically selecting actions—such as answering, retrieving, or abstaining—during reasoning with large language models, where existing approaches lack theoretical guidance on matching controller complexity to data scale. The paper introduces the first theoretical framework for controller selection, organizing controllers into four nested complexity classes and prescribing optimal class choice based on estimable data bottlenecks, including action budget, signal reliability, and routing gain. Leveraging information-theoretic lower bounds, Bernstein-type concentration thresholds, and strictly nested cross-validation, the study reveals fundamental limitations of instance-level uncertainty signals under finite samples and enables near-optimal controller class selection. Experiments show that the theoretically predicted optimal controller aligns with empirical best performance on SMS-Spam, HallusionBench, A-OKVQA, and FOLIO; on TextVQA, a gating controller incorporating unlabeled OCR priors achieves superior results.
📝 Abstract
Deployed language and vision-language models must decide, on each input, whether to answer directly, retrieve evidence, defer to a stronger model, or abstain. Contrary to the common monotonicity intuition, greater per-input expressivity is not uniformly beneficial in finite samples: under identical strict cross-validation, different benchmarks prefer different controller classes. This reflects a finite-sample limitation of instance-level uncertainty signals, which can be exhausted at a distribution-dependent scale. We organize controllers into a nested lattice of four classes: fixed actions, partition routers, instance-level controllers, and prior-gated controllers, ordered by complexity. We prove a regime theory that turns three data-estimable bottlenecks into a class choice: how much improvement is possible beyond the best fixed action, whether there are enough samples for instance-level controllers to make reliable decisions, and how much improvement a coarse partition router can recover when instance-level signal is unreliable. The resulting Bernstein-tight threshold has a matching information-theoretic lower bound, and strict nested cross-validation provably selects a near-best class. Across SMS-Spam, HallusionBench, A-OKVQA, and FOLIO, the predicted class matches the empirical winner; the prior-gated controller wins on TextVQA when OCR tokens supply a label-free prediction-time prior. Code is available at https://github.com/Anonymous-Awesome-Submissions/Regime-Theory.