🤖 AI Summary
Quantifying generation uncertainty in black-box large language models (LLMs) is challenging due to the absence of token-level probabilities and ground-truth labels. Method: We propose the first unsupervised conformal inference framework for LLMs, leveraging the geometric structure of response embeddings to construct a Gram-matrix-based atypicality score. It integrates bootstrap-enhanced uncertainty calibration (UCP) and conformal alignment to enable user-predicate-driven, statistically rigorous threshold calibration—without accessing model internals or ground-truth labels. Contribution/Results: Relying solely on response embeddings and bootstrap-residual aggregation, our framework achieves uncertainty calibration and hallucination filtering. Experiments across multiple benchmarks show near-nominal coverage, significantly reduced hallucination rates, tighter and more stable thresholds, and superior performance over lightweight detectors—while incurring comparable computational overhead.
📝 Abstract
Deploying black-box LLMs requires managing uncertainty in the absence of token-level probability or true labels. We propose introducing an unsupervised conformal inference framework for generation, which integrates: generative models, incorporating: (i) an LLM-compatible atypical score derived from response-embedding Gram matrix, (ii) UCP combined with a bootstrapping variant (BB-UCP) that aggregates residuals to refine quantile precision while maintaining distribution-free, finite-sample coverage, and (iii) conformal alignment, which calibrates a single strictness parameter $τ$ so a user predicate (e.g., factuality lift) holds on unseen batches with probability $ge 1-α$. Across different benchmark datasets, our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP, while consistently reducing the severity of hallucination, outperforming lightweight per-response detectors with similar computational demands. The result is a label-free, API-compatible gate for test-time filtering that turns geometric signals into calibrated, goal-aligned decisions.