The Provenance Problem: LLMs and the Breakdown of Citation Norms

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative AI in scholarly writing introduces a “source attribution problem”: AI systems may inadvertently reproduce ideas from obscure or inaccessible literature (e.g., a 1975 paper unknown to the author), resulting in uncredited knowledge appropriation—unintentional yet ethically significant. This challenges established definitions of plagiarism and citation norms, undermining the integrity of academic credit systems, and remains unaddressed by current research ethics frameworks. Drawing on philosophical analysis and sociology of science, this study develops an original conceptual framework for the “source attribution problem,” employing case-based reasoning and normative theory construction. It systematically identifies and defines this novel form of attribution harm, elucidating AI’s profound implications for authorship, epistemic provenance, and scholarly justice. The work further proposes governance pathways that balance epistemic justice with practical feasibility—thereby addressing a critical theoretical and operational gap in AI-era attribution ethics. (149 words)

Technology Category

Application Category

📝 Abstract
The increasing use of generative AI in scientific writing raises urgent questions about attribution and intellectual credit. When a researcher employs ChatGPT to draft a manuscript, the resulting text may echo ideas from sources the author has never encountered. If an AI system reproduces insights from, for example, an obscure 1975 paper without citation, does this constitute plagiarism? We argue that such cases exemplify the 'provenance problem': a systematic breakdown in the chain of scholarly credit. Unlike conventional plagiarism, this phenomenon does not involve intent to deceive (researchers may disclose AI use and act in good faith) yet still benefit from the uncredited intellectual contributions of others. This dynamic creates a novel category of attributional harm that current ethical and professional frameworks fail to address. As generative AI becomes embedded across disciplines, the risk that significant ideas will circulate without recognition threatens both the reputational economy of science and the demands of epistemic justice. This Perspective analyzes how AI challenges established norms of authorship, introduces conceptual tools for understanding the provenance problem, and proposes strategies to preserve integrity and fairness in scholarly communication.
Problem

Research questions and friction points this paper is trying to address.

AI-generated text risks uncredited use of sources
Breakdown in scholarly attribution without deceptive intent
Current ethical frameworks fail to address AI attribution harms
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI challenges authorship norms
Introduces provenance problem concept
Proposes integrity preservation strategies
🔎 Similar Papers
No similar papers found.
Brian D. Earp
Brian D. Earp
Associate Professor, National University of Singapore and Research Associate, University of Oxford
BioethicsPhilosophy of Science & AIRelational Moral PsychologySex & GenderChildren's Rights
H
Haotian Yuan
Georgetown Preparatory School, Bethesda, Maryland, United States
J
Julian Koplin
Monash Bioethics Centre, Monash University, Melbourne, Victoria, Australia
S
Sebastian Porsdam Mann
Centre for Advanced Studies in Bioscience Innovation Law (CeBIL), Faculty of Law, University of Copenhagen