GPTKB: Comprehensively Materializing Factual LLM Knowledge

📅 2024-11-07

📈 Citations: 1

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing LLM knowledge analysis relies on small-scale, manually crafted queries, suffering from availability bias and failing to comprehensively characterize models’ implicit factual knowledge. Method: We propose a recursive knowledge materialization framework that enables LLMs to autonomously generate question-answer pairs, extract entity-relation triples, and perform multi-round consistency verification and fusion—achieving the first unbiased, large-scale explicit extraction of factual knowledge from LLMs. The method uses GPT-4o-mini without human-defined questions or domain-specific priors. Contribution/Results: We construct GPTKB, a knowledge base covering 2.9 million entities and 105 million high-quality triples, at just 1% the cost of conventional knowledge base projects. GPTKB is fully open-sourced. This work establishes a new paradigm for low-cost, domain-agnostic, decentralized knowledge base construction.

Technology Category

Application Category

📝 Abstract

LLMs have majorly advanced NLP and AI, and next to their ability to perform a wide range of procedural tasks, a major success factor is their internalized factual knowledge. Since (Petroni et al., 2019), analyzing this knowledge has gained attention. However, most approaches investigate one question at a time via modest-sized pre-defined samples, introducing an availability bias (Tversky and Kahnemann, 1973) that prevents the discovery of knowledge (or beliefs) of LLMs beyond the experimenter's predisposition. To address this challenge, we propose a novel methodology to comprehensively materializing an LLM's factual knowledge through recursive querying and result consolidation. As a prototype, we employ GPT-4o-mini to construct GPTKB, a large-scale knowledge base (KB) comprising 105 million triples for over 2.9 million entities - achieved at 1% of the cost of previous KB projects. This work marks a milestone in two areas: For LLM research, for the first time, it provides constructive insights into the scope and structure of LLMs' knowledge (or beliefs). For KB construction, it pioneers new pathways for the long-standing challenge of general-domain KB construction. GPTKB is accessible at https://gptkb.org.

Problem

Research questions and friction points this paper is trying to address.

Analyzing LLM knowledge beyond predefined samples to avoid bias

Materializing LLM's factual knowledge via recursive querying and consolidation

Building a scalable knowledge base to assess LLM knowledge scope

Innovation

Methods, ideas, or system contributions that make the work stand out.

Recursive querying for comprehensive knowledge materialization

Constructing large-scale knowledge base from LLM outputs

Multi-dimensional analysis of LLM knowledge characteristics

🔎 Similar Papers

Large Language Model Enhanced Knowledge Representation Learning: A Survey