🤖 AI Summary
Large language models (LLMs) encode factual knowledge implicitly, making it difficult to trace, audit, or analyze statistically.
Method: This paper introduces a large-scale, recursive LLM knowledge concretization framework—leveraging GPT-4.1 with integrated recursive prompting, entity linking, triplet extraction, and consistency verification—to explicitly model implicit factual knowledge at scale.
Contribution/Results: The framework constructs a densely interconnected knowledge base comprising 100 million high-quality RDF triplets. The resulting knowledge graph supports SPARQL querying, interactive graph navigation, and multi-hop link traversal, enabling traceable, queryable, and comparative knowledge analysis. Built at a cost of $14,000, it improves storage efficiency, analytical scalability, and automation capability for knowledge base construction from LLMs. A demonstration system is publicly released as open-source software.
📝 Abstract
Language models are powerful tools, yet their factual knowledge is still poorly understood, and inaccessible to ad-hoc browsing and scalable statistical analysis. This demonstration introduces GPTKB v1.5, a densely interlinked 100-million-triple knowledge base (KB) built for $14,000 from GPT-4.1, using the GPTKB methodology for massive-recursive LLM knowledge materialization (Hu et al., ACL 2025). The demonstration experience focuses on three use cases: (1) link-traversal-based LLM knowledge exploration, (2) SPARQL-based structured LLM knowledge querying, (3) comparative exploration of the strengths and weaknesses of LLM knowledge. Massive-recursive LLM knowledge materialization is a groundbreaking opportunity both for the research area of systematic analysis of LLM knowledge, as well as for automated KB construction. The GPTKB demonstrator is accessible at https://gptkb.org.