LLM hallucinations in the wild: Large-scale evidence from non-existent citations

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This study addresses the growing threat posed by hallucinated citations generated by large language models (LLMs) in academic writing, which undermines the reliability and equity of scientific knowledge. Leveraging a dataset of 111 million references from 2.5 million papers across arXiv, bioRxiv, SSRN, and PubMed Central, the authors employ citation verification algorithms, textual feature analysis, and statistical modeling to systematically identify and quantify LLM-induced citation hallucinations. Findings reveal at least 146,932 such hallucinated references as of 2025, spanning multiple disciplines—with pronounced prevalence in fields rapidly adopting AI and among early-career researchers. Critically, current peer review mechanisms exhibit minimal detection efficacy, and the phenomenon risks exacerbating inequalities linked to gender and academic status.

📝 Abstract

Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estimate of 146,932 hallucinated citations in 2025 alone. These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams. At the same time, hallucinated references disproportionately assign credit to already prominent and male scholars, suggesting that LLM-generated errors may reinforce existing inequities in scientific recognition. Preprint moderation and journal publication processes capture only a fraction of these errors, suggesting that the spread of hallucinated content has outpaced existing safeguards. Together, these findings demonstrate that LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.

Problem

Research questions and friction points this paper is trying to address.

LLM hallucinations

non-existent citations

scientific reliability

AI-generated errors

research equity

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM hallucination

non-existent citations

AI-assisted writing