What did Elon change? A comprehensive analysis of Grokipedia

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the emergence of “shadow encyclopedias”—AI-generated knowledge repositories masquerading as independent sources—by investigating Grokipedia, launched in October 2025, and its relationship with Wikipedia. Method: Leveraging web crawling, cosine similarity on BERT-based semantic embeddings, and source credibility classification, we quantitatively compare content originality, citation quality, and topical distribution across both platforms. Contribution/Results: Grokipedia exhibits high textual derivation from Wikipedia (mean similarity: 82%), yet selectively rewrites high-quality articles in history, politics, and sociology. Critically, its citations contain 3.7× more unreliable sources than Wikipedia’s and systematically omit peer-reviewed journals and authoritative news outlets. These findings expose structural biases in knowledge reuse and citation practices within AI-curated encyclopedias, revealing a critical gap in epistemic accountability. The study provides empirical evidence to inform governance frameworks for digital knowledge ecosystems and underscores urgent needs for transparency, provenance tracking, and citation integrity standards in AI-augmented knowledge infrastructure.

Technology Category

Application Category

📝 Abstract
Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed"generally unreliable"or"blacklisted"by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history. Finally, we publicly release our nearly-full scrape of Grokipedia, as well as embeddings of the entire Grokipedia corpus.
Problem

Research questions and friction points this paper is trying to address.

Analyzing content similarity between Grokipedia and Wikipedia articles
Comparing citation reliability practices across the two platforms
Investigating topic bias in Grokipedia's selective content rewriting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed article similarity using comparative corpus analysis
Evaluated citation reliability through source quality assessment
Applied subset analysis to identify content rewriting patterns
🔎 Similar Papers
No similar papers found.