🤖 AI Summary
This study addresses the growing threat of “phantom citations” generated by large language models (LLMs) in academic writing, which severely undermines citation reliability and scholarly integrity. To systematically quantify the prevalence and disciplinary variation of citation hallucinations in the LLM era, the authors introduce CiteVerifier, an open-source citation verification framework that integrates automated detection, manual validation, and meta-analysis across 13 LLMs, 40 academic disciplines, and over 56,000 top-tier conference papers. Findings reveal hallucination rates ranging from 14.23% to 94.93% across all evaluated models; 1.07% of papers published in top conferences between 2020 and 2025 contain fabricated references, with a sharp 80.9% increase observed in 2025 alone. Moreover, more than 40% of authors neglect to verify citations, and nearly 80% of reviewers fail to check references, exposing a critical “verification gap.” The paper concludes by proposing multi-faceted intervention strategies to mitigate these risks.
📝 Abstract
Citations provide the basis for trusting scientific claims; when they are invalid or fabricated, this trust collapses. With the advent of Large Language Models (LLMs), this risk has intensified: LLMs are increasingly used for academic writing, yet their tendency to fabricate citations (``ghost citations'') poses a systemic threat to citation validity. To quantify this threat and inform mitigation, we develop CiteVerifier, an open-source framework for large-scale citation verification, and conduct the first comprehensive study of citation validity in the LLM era through three experiments built on it. We benchmark 13 state-of-the-art LLMs on citation generation across 40 research domains, finding that all models hallucinate citations at rates from 14.23\% to 94.93\%, with significant variation across research domains. Moreover, we analyze 2.2 million citations from 56,381 papers published at top-tier AI/ML and Security venues (2020--2025), confirming that 1.07\% of papers contain invalid or fabricated citations (604 papers), with an 80.9\% increase in 2025 alone. Furthermore, we survey 97 researchers and analyze 94 valid responses after removing 3 conflicting samples, revealing a critical ``verification gap'': 41.5\% of researchers copy-paste BibTeX without checking and 44.4\% choose no-action responses when encountering suspicious references; meanwhile, 76.7\% of reviewers do not thoroughly check references and 80.0\% never suspect fake citations. Our findings reveal an accelerating crisis where unreliable AI tools, combined with inadequate human verification by researchers and insufficient peer review scrutiny, enable fabricated citations to contaminate the scientific record. We propose interventions for researchers, venues, and tool developers to protect citation integrity.