GPTKB v1.5: A Massive Knowledge Base for Exploring Factual LLM Knowledge

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) encode factual knowledge implicitly, making it difficult to trace, audit, or analyze statistically. Method: This paper introduces a large-scale, recursive LLM knowledge concretization framework—leveraging GPT-4.1 with integrated recursive prompting, entity linking, triplet extraction, and consistency verification—to explicitly model implicit factual knowledge at scale. Contribution/Results: The framework constructs a densely interconnected knowledge base comprising 100 million high-quality RDF triplets. The resulting knowledge graph supports SPARQL querying, interactive graph navigation, and multi-hop link traversal, enabling traceable, queryable, and comparative knowledge analysis. Built at a cost of $14,000, it improves storage efficiency, analytical scalability, and automation capability for knowledge base construction from LLMs. A demonstration system is publicly released as open-source software.

Technology Category

Application Category

📝 Abstract
Language models are powerful tools, yet their factual knowledge is still poorly understood, and inaccessible to ad-hoc browsing and scalable statistical analysis. This demonstration introduces GPTKB v1.5, a densely interlinked 100-million-triple knowledge base (KB) built for $14,000 from GPT-4.1, using the GPTKB methodology for massive-recursive LLM knowledge materialization (Hu et al., ACL 2025). The demonstration experience focuses on three use cases: (1) link-traversal-based LLM knowledge exploration, (2) SPARQL-based structured LLM knowledge querying, (3) comparative exploration of the strengths and weaknesses of LLM knowledge. Massive-recursive LLM knowledge materialization is a groundbreaking opportunity both for the research area of systematic analysis of LLM knowledge, as well as for automated KB construction. The GPTKB demonstrator is accessible at https://gptkb.org.
Problem

Research questions and friction points this paper is trying to address.

Understanding and accessing factual knowledge in language models
Creating a scalable knowledge base from LLM-generated data
Exploring strengths and weaknesses of LLM knowledge systematically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Densely interlinked 100M-triple KB from GPT-4.1
Massive-recursive LLM knowledge materialization
SPARQL-based structured LLM knowledge querying
🔎 Similar Papers
No similar papers found.