Mining the Mind: What 100M Beliefs Reveal About Frontier LLM Knowledge

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Contemporary large language models (LLMs), such as GPT-4.1, exhibit critical deficiencies in factual knowledge—namely, hallucination, internal inconsistency, and semantic ambiguity—yet existing evaluations rely on small-scale, biased benchmarks that inadequately reflect real-world knowledge reliability. Method: We systematically extract and analyze 100 million fact-belief statements generated by the model. To ensure representativeness and reduce sampling bias, we introduce Recursive Prompting to construct GPTKB v1.5, a large-scale, unbiased factual dataset. We then apply statistical modeling and cross-source validation against authoritative knowledge bases (e.g., Wikidata, DBpedia). Contribution/Results: Our analysis quantifies pervasive hallucinations and inconsistencies in LLM knowledge for the first time at scale, revealing significantly lower factual accuracy than reported on standard benchmarks, substantial distributional divergence from structured knowledge repositories, and severe methodological biases in current evaluation protocols. This work establishes a reproducible, large-scale empirical framework and a new benchmark paradigm for rigorous LLM knowledge assessment.

Technology Category

Application Category

📝 Abstract
LLMs are remarkable artifacts that have revolutionized a range of NLP and AI tasks. A significant contributor is their factual knowledge, which, to date, remains poorly understood, and is usually analyzed from biased samples. In this paper, we take a deep tour into the factual knowledge (or beliefs) of a frontier LLM, based on GPTKB v1.5 (Hu et al., 2025a), a recursively elicited set of 100 million beliefs of one of the strongest currently available frontier LLMs, GPT-4.1. We find that the models' factual knowledge differs quite significantly from established knowledge bases, and that its accuracy is significantly lower than indicated by previous benchmarks. We also find that inconsistency, ambiguity and hallucinations are major issues, shedding light on future research opportunities concerning factual LLM knowledge.
Problem

Research questions and friction points this paper is trying to address.

Analyzing factual knowledge and beliefs in frontier LLMs using 100M data points
Revealing significant accuracy gaps between LLM knowledge and established benchmarks
Identifying inconsistency, ambiguity and hallucinations as major knowledge issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes 100M beliefs from frontier LLM
Uses recursively elicited GPTKB v1.5 dataset
Reveals accuracy gaps via knowledge comparison
🔎 Similar Papers
No similar papers found.
Shrestha Ghosh
Shrestha Ghosh
University of Tübingen
Knowledge BasesNatural Language ProcessingInformation RetrievalInformation Extraction
L
Luca Giordano
ScaDS.AI Dresden/Leipzig & TU Dresden, Germany
Y
Yujia Hu
ScaDS.AI Dresden/Leipzig & TU Dresden, Germany
T
Tuan-Phong Nguyen
VNU University of Engineering and Technology, Hanoi, Vietnam
Simon Razniewski
Simon Razniewski
Professor at ScaDS.AI & TU Dresden
Language ModelsKnowledge BasesCommonsense KnowledgeNLP