LLM hallucinations in the wild: Large-scale evidence from non-existent citations

๐Ÿ“… 2026-05-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

223K/year
๐Ÿค– AI Summary
This study addresses the growing threat posed by hallucinated citations generated by large language models (LLMs) in academic writing, which undermines the reliability and equity of scientific knowledge. Leveraging a dataset of 111 million references from 2.5 million papers across arXiv, bioRxiv, SSRN, and PubMed Central, the authors employ citation verification algorithms, textual feature analysis, and statistical modeling to systematically identify and quantify LLM-induced citation hallucinations. Findings reveal at least 146,932 such hallucinated references as of 2025, spanning multiple disciplinesโ€”with pronounced prevalence in fields rapidly adopting AI and among early-career researchers. Critically, current peer review mechanisms exhibit minimal detection efficacy, and the phenomenon risks exacerbating inequalities linked to gender and academic status.
๐Ÿ“ Abstract
Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estimate of 146,932 hallucinated citations in 2025 alone. These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams. At the same time, hallucinated references disproportionately assign credit to already prominent and male scholars, suggesting that LLM-generated errors may reinforce existing inequities in scientific recognition. Preprint moderation and journal publication processes capture only a fraction of these errors, suggesting that the spread of hallucinated content has outpaced existing safeguards. Together, these findings demonstrate that LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.
Problem

Research questions and friction points this paper is trying to address.

LLM hallucinations
non-existent citations
scientific reliability
AI-generated errors
research equity
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM hallucination
non-existent citations
AI-assisted writing
scientific inequity
large-scale auditing