🤖 AI Summary
Existing confidence estimation methods for large language models exhibit fragility under distributional shifts, domain-specific text, and resource-constrained settings. This work proposes Structural Confidence, a novel framework that leverages multi-scale structural signals—such as spectral properties, local variability, and global geometry—from the trajectory of the model’s final-layer hidden states to construct a model-agnostic posterior confidence estimator. Notably, this approach requires only a single forward pass and dispenses with repeated sampling or auxiliary models. Evaluated across four heterogeneous benchmarks—FEVER, SciFact, WikiBio-hallucination, and TruthfulQA—the method consistently outperforms current baselines in both AUROC and AUPR metrics, demonstrating superior reliability and computational efficiency.
📝 Abstract
Large language models (LLMs) are increasingly deployed in domains where errors carry high social, scientific, or safety costs. Yet standard confidence estimators, such as token likelihood, semantic similarity and multi-sample consistency, remain brittle under distribution shift, domain-specialised text, and compute limits. In this work, we present Structural Confidence, a single-pass, model-agnostic framework that enhances output correctness prediction based on multi-scale structural signals derived from a model's final-layer hidden-state trajectory. By combining spectral, local-variation, and global shape descriptors, our method captures internal stability patterns that are missed by probabilities and sentence embeddings. We conduct extensive, cross-domain evaluation across four heterogeneous benchmarks-FEVER (fact verification), SciFact (scientific claims), WikiBio-hallucination (biographical consistency), and TruthfulQA (truthfulness-oriented QA). Our Structural Confidence framework demonstrates strong performance compared with established baselines in terms of AUROC and AUPR. More importantly, unlike sampling-based consistency methods which require multiple stochastic generations and an auxiliary model, our approach uses a single deterministic forward pass, offering a practical basis for efficient, robust post-hoc confidence estimation in socially impactful, resource-constrained LLM applications.