🤖 AI Summary
Existing benchmarks such as MMLU overestimate the factual accuracy of large language models due to fixed questions and availability bias. This work proposes a novel approach that automatically generates approximately one million encyclopedia-style articles directly from model parameter memory—without external retrieval—to systematically materialize knowledge boundaries and enable large-scale factuality evaluation against verifiable web evidence. We present the first fully open-source parametric encyclopedia, releasing all prompts, generated content, and evaluation results. Experiments reveal that gpt-5-mini achieves 74.7% factual accuracy on Wikipedia-covered topics, dropping to 63.2% on frontier topics; only 61% of generated topics are covered by Wikipedia, and topic overlap among the three major model families is merely 7.3%. Compared to Grokipedia, our method achieves higher factuality at roughly half the text similarity and provides a browsable open interface.
📝 Abstract
Benchmarks such as MMLU suggest flagship language models approach factuality saturation, with scores above 90\%. We show this picture is incomplete. \emph{LLMpedia} generates encyclopedic articles entirely from parametric memory, producing ${\sim}$1M articles across three model families without retrieval. For gpt-5-mini, the verifiable true rate on Wikipedia-covered subjects is only 74.7\% -- more than 15 percentage points below the benchmark-based picture, consistent with the availability bias of fixed-question evaluation. Beyond Wikipedia, frontier subjects verifiable only through curated web evidence fall further to 63.2\% true rate. Wikipedia covers just 61\% of surfaced subjects, and three model families overlap by only 7.3\% in subject choice. In a capture-trap benchmark inspired by prior analysis of Grokipedia, LLMpedia achieves substantially higher factuality at roughly half the textual similarity to Wikipedia. Unlike Grokipedia, every prompt, artifact, and evaluation verdict is publicly released, making LLMpedia the first fully open parametric encyclopedia -- bridging factuality evaluation and knowledge materialization. All data, code, and a browsable interface are at https://llmpedia.net.