🤖 AI Summary
This work addresses the challenge of reliably controlling hallucinations in large language models (LLMs) at test time, where existing conformal methods offer only marginal coverage guarantees, leading to uneven coverage across samples of varying difficulty and overly large prediction sets. To overcome this limitation, the authors propose the Conditional Factuality Control (CFC) framework, which provides the first conditional coverage guarantee for LLM outputs. CFC generates prediction sets via feature-conditional thresholds, leveraging calibrated quantile regression to construct continuous acceptance thresholds and incorporating a fixed-point rule for inference. The CFC-PAC variant further delivers finite-sample, probably approximately correct (PAC)-style generalization certificates. Empirical evaluations on synthetic data, question-answering benchmarks, and the Flickr8k vision-language task demonstrate that CFC achieves near-target coverage uniformly across easy and hard instances while yielding substantially smaller prediction sets than baseline methods.
📝 Abstract
Large language models (LLMs) need reliable test-time control of hallucinations. Existing conformal methods for LLMs typically provide only \emph{marginal} guarantees and rely on a single global threshold, which can under-cover hard prompts, over-cover easy ones, and produce oversized prediction sets. We propose \emph{Conditional Factuality Control} (CFC), a post-hoc conformal framework that returns \emph{set-valued} outputs with \emph{conditional} coverage guarantees. CFC defines a continuous, feature-conditional acceptance threshold through augmented quantile regression on a latent ``success'' score, and deploys it through a fixed-point threshold rule at inference time. Theoretically, we show that CFC satisfies a conditional coverage guarantee under exchangeability and analyze its \emph{efficiency}, proving that, under mild assumptions on the score distributions, the conditional rule is strictly more sample-efficient than marginal conformal prediction at the same target coverage. We further derive a PAC-style variant, CFC-PAC, which shrinks the nominal risk level based on a stability bound, yielding a finite-sample certificate that the conditional miscoverage deviates from the target by at most $O(\sqrt{\log(1/δ)/N})$. Empirically, on synthetic data, real-world reasoning and QA benchmarks, and a Flickr8k VLM setting, CFC and CFC-PAC consistently attain near-target coverage across difficulty groups while using smaller prediction sets than CP and non-CP baselines.