Embedding Trust: Semantic Isotropy Predicts Nonfactuality in Long-Form Text Generation

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

To address the high computational cost and poor robustness of factuality assessment in long-text generation by large language models (LLMs), this paper proposes the Semantic Isotropy Index (SII)—an unsupervised metric quantifying semantic uniformity of generated content via angular dispersion of normalized text embeddings on the unit hypersphere. SII detects factual inconsistency without requiring labeled data, model fine-tuning, or hyperparameter optimization, achieving stable prediction with only a few samples. Evaluated across diverse open-domain question-answering tasks, SII substantially outperforms existing fact-checking and consistency evaluation methods in accuracy while incurring minimal computational overhead and enabling straightforward deployment. Its efficiency, scalability, and annotation-free design establish a novel, practical paradigm for trustworthiness assessment in real-world LLM applications.

Technology Category

Application Category

📝 Abstract

To deploy large language models (LLMs) in high-stakes application domains that require substantively accurate responses to open-ended prompts, we need reliable, computationally inexpensive methods that assess the trustworthiness of long-form responses generated by LLMs. However, existing approaches often rely on claim-by-claim fact-checking, which is computationally expensive and brittle in long-form responses to open-ended prompts. In this work, we introduce semantic isotropy -- the degree of uniformity across normalized text embeddings on the unit sphere -- and use it to assess the trustworthiness of long-form responses generated by LLMs. To do so, we generate several long-form responses, embed them, and estimate the level of semantic isotropy of these responses as the angular dispersion of the embeddings on the unit sphere. We find that higher semantic isotropy -- that is, greater embedding dispersion -- reliably signals lower factual consistency across samples. Our approach requires no labeled data, no fine-tuning, and no hyperparameter selection, and can be used with open- or closed-weight embedding models. Across multiple domains, our method consistently outperforms existing approaches in predicting nonfactuality in long-form responses using only a handful of samples -- offering a practical, low-cost approach for integrating trust assessment into real-world LLM workflows.

Problem

Research questions and friction points this paper is trying to address.

Assessing trustworthiness of long-form LLM responses

Predicting factual inconsistency without labeled data

Using semantic isotropy to measure embedding dispersion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses semantic isotropy to assess text trustworthiness

Measures embedding dispersion on unit sphere

Requires no labeled data or model fine-tuning

🔎 Similar Papers

Can Large Language Models Detect Misinformation in Scientific News Reporting?