🤖 AI Summary
This study addresses the emergence of “shadow encyclopedias”—AI-generated knowledge repositories masquerading as independent sources—by investigating Grokipedia, launched in October 2025, and its relationship with Wikipedia.
Method: Leveraging web crawling, cosine similarity on BERT-based semantic embeddings, and source credibility classification, we quantitatively compare content originality, citation quality, and topical distribution across both platforms.
Contribution/Results: Grokipedia exhibits high textual derivation from Wikipedia (mean similarity: 82%), yet selectively rewrites high-quality articles in history, politics, and sociology. Critically, its citations contain 3.7× more unreliable sources than Wikipedia’s and systematically omit peer-reviewed journals and authoritative news outlets. These findings expose structural biases in knowledge reuse and citation practices within AI-curated encyclopedias, revealing a critical gap in epistemic accountability. The study provides empirical evidence to inform governance frameworks for digital knowledge ecosystems and underscores urgent needs for transparency, provenance tracking, and citation integrity standards in AI-augmented knowledge infrastructure.
📝 Abstract
Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed"generally unreliable"or"blacklisted"by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history. Finally, we publicly release our nearly-full scrape of Grokipedia, as well as embeddings of the entire Grokipedia corpus.