🤖 AI Summary
Large language models (LLMs) suffer from hallucination, and existing uncertainty quantification (UQ) methods struggle to simultaneously capture global (batch-level) and local (instance-level) uncertainty—especially in black-box settings where per-response reliability assessment remains ineffective. This paper introduces the first geometry-structured UQ framework for black-box LLMs: it performs prototype analysis on response embeddings, uses convex hull volume to quantify batch-level uncertainty, and defines a “geometric skepticism score” for instance-level reliability ranking. We theoretically establish its intrinsic connection to information entropy. The method requires no internal model access—only sampled responses and their embeddings. Evaluated on short-answer QA and high-stakes medical benchmarks, it significantly outperforms prior black-box UQ approaches, substantially reducing hallucination rates while offering both theoretical rigor and practical deployability.
📝 Abstract
Large language models demonstrate impressive results across diverse tasks but are still known to hallucinate, generating linguistically plausible but incorrect answers to questions. Uncertainty quantification has been proposed as a strategy for hallucination detection, but no existing black-box approach provides estimates for both global and local uncertainty. The former attributes uncertainty to a batch of responses, while the latter attributes uncertainty to individual responses. Current local methods typically rely on white-box access to internal model states, whilst black-box methods only provide global uncertainty estimates. We introduce a geometric framework to address this, based on archetypal analysis of batches of responses sampled with only black-box model access. At the global level, we propose Geometric Volume, which measures the convex hull volume of archetypes derived from response embeddings. At the local level, we propose Geometric Suspicion, which ranks responses by reliability and enables hallucination reduction through preferential response selection. Unlike prior dispersion methods which yield only a single global score, our approach provides semantic boundary points which have utility for attributing reliability to individual responses. Experiments show that our framework performs comparably to or better than prior methods on short form question-answering datasets, and achieves superior results on medical datasets where hallucinations carry particularly critical risks. We also provide theoretical justification by proving a link between convex hull volume and entropy.