Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing uncertainty quantification (UQ) methods for large language models (LLMs) lack a rigorous probabilistic foundation and semantic consistency. Method: This paper proposes the first fully probabilistic UQ framework for LLMs, centered on inverse modeling to characterize the semantic diversity of the input space given an output. It models the input–output relationship via double stochastic walks and introduces Inv-Entropy—a novel uncertainty metric grounded in inverse probability inference. The framework integrates semantic similarity embeddings, a genetic algorithm–assisted adaptive perturbation (GAAP) strategy, and temperature-sensitivity assessment (TSU), the latter requiring no ground-truth labels. Contribution/Results: Extensive experiments demonstrate that our framework significantly outperforms state-of-the-art semantic UQ approaches across multiple tasks. The implementation is open-source, and its modular architecture enables flexible substitution of embedding, perturbation, and similarity components.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have transformed natural language processing, but their reliable deployment requires effective uncertainty quantification (UQ). Existing UQ methods are often heuristic and lack a probabilistic foundation. This paper begins by providing a theoretical justification for the role of perturbations in UQ for LLMs. We then introduce a dual random walk perspective, modeling input-output pairs as two Markov chains with transition probabilities defined by semantic similarity. Building on this, we propose a fully probabilistic framework based on an inverse model, which quantifies uncertainty by evaluating the diversity of the input space conditioned on a given output through systematic perturbations. Within this framework, we define a new uncertainty measure, Inv-Entropy. A key strength of our framework is its flexibility: it supports various definitions of uncertainty measures, embeddings, perturbation strategies, and similarity metrics. We also propose GAAP, a perturbation algorithm based on genetic algorithms, which enhances the diversity of sampled inputs. In addition, we introduce a new evaluation metric, Temperature Sensitivity of Uncertainty (TSU), which directly assesses uncertainty without relying on correctness as a proxy. Extensive experiments demonstrate that Inv-Entropy outperforms existing semantic UQ methods. The code to reproduce the results can be found at https://github.com/UMDataScienceLab/Uncertainty-Quantification-for-LLMs.
Problem

Research questions and friction points this paper is trying to address.

Lack of probabilistic foundation in existing uncertainty quantification methods for LLMs
Need for theoretical justification of perturbations in LLM uncertainty quantification
Absence of flexible frameworks supporting diverse uncertainty measures and metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual random walk models semantic similarity chains
Inverse model evaluates input diversity systematically
Genetic algorithm enhances perturbation diversity
🔎 Similar Papers
No similar papers found.