LEC: Linear Expectation Constraints for False-Discovery Control in Selective Prediction and Routing Systems

๐Ÿ“… 2025-12-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) frequently produce unreliable outputs, and existing uncertainty estimation methods lack statistical guarantees for reliably identifying incorrect answers. To address this, we propose a false discovery rate (FDR) control framework that formulates selective prediction as a decision problem with linear expectation constraints, deriving a sufficient condition for finite-sample FDR control via calibration sets. We further design an uncertainty-aware dual-model routing mechanism that intelligently allocates tasks across models while maintaining unified FDR guarantees. Our method performs offline calibration under the exchangeability assumption, balancing theoretical rigor with practical applicability. Experiments on multiple question-answering benchmarks demonstrate significant improvements in both FDR control accuracy and effective sample retention rate. Moreover, under strict FDR control, our approach substantially increases the number of correctly accepted answersโ€”thereby enhancing reliability without sacrificing coverage.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) often generate unreliable answers, while heuristic uncertainty methods fail to fully distinguish correct from incorrect predictions, causing users to accept erroneous answers without statistical guarantees. We address this issue through the lens of false discovery rate (FDR) control, ensuring that among all accepted predictions, the proportion of errors does not exceed a target risk level. To achieve this in a principled way, we propose LEC, which reinterprets selective prediction as a constrained decision problem by enforcing a Linear Expectation Constraint over selection and error indicators. Then, we establish a finite-sample sufficient condition, which relies only on a held-out set of exchangeable calibration samples, to compute an FDR-constrained, coverage-maximizing threshold. Furthermore, we extend LEC to a two-model routing mechanism: given a prompt, if the current model's uncertainty exceeds its calibrated threshold, we delegate it to a stronger model, while maintaining a unified FDR guarantee. Evaluations on closed-ended and open-ended question-answering (QA) datasets show that LEC achieves tighter FDR control and substantially improves sample retention over prior methods. Moreover, the two-model routing mechanism achieves lower risk levels while accepting more correct samples than each individual model.
Problem

Research questions and friction points this paper is trying to address.

Control false discovery rate in LLM predictions
Ensure error proportion below target risk level
Extend FDR control to two-model routing systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Expectation Constraint for FDR control
Finite-sample threshold from exchangeable calibration data
Two-model routing with unified FDR guarantee
๐Ÿ”Ž Similar Papers
No similar papers found.