GPT Editors, Not Authors: The Stylistic Footprint of LLMs in Academic Preprints

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the critical question of whether large language models (LLMs) are employed for global polishing or local content generation in academic preprints. We propose an interpretable bibliometric analysis framework that integrates PELT changepoint detection with a Bayesian classifier, leveraging stylistic feature modeling and GPT-regenerated texts to construct training data and quantify the spatial distribution of LLM traces. Empirical results reveal strong global uniformity in LLM intervention—indicating predominant use for holistic revision (e.g., grammatical correction, stylistic harmonization) rather than piecemeal content generation. This finding mitigates the risk of hallucinated text infiltrating preprints and establishes the first interpretable, reproducible detection paradigm for academic integrity assessment, grounded in stylistic segmentation and changepoint analysis.

Technology Category

Application Category

📝 Abstract
The proliferation of Large Language Models (LLMs) in late 2022 has impacted academic writing, threatening credibility, and causing institutional uncertainty. We seek to determine the degree to which LLMs are used to generate critical text as opposed to being used for editing, such as checking for grammar errors or inappropriate phrasing. In our study, we analyze arXiv papers for stylistic segmentation, which we measure by varying a PELT threshold against a Bayesian classifier trained on GPT-regenerated text. We find that LLM-attributed language is not predictive of stylistic segmentation, suggesting that when authors use LLMs, they do so uniformly, reducing the risk of hallucinations being introduced into academic preprints.
Problem

Research questions and friction points this paper is trying to address.

Assess LLM usage in academic writing vs editing
Measure stylistic impact of LLMs on arXiv papers
Determine uniformity of LLM use to reduce hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stylistic segmentation analysis of arXiv papers
Bayesian classifier trained on GPT-regenerated text
PELT threshold variation for measurement
🔎 Similar Papers
No similar papers found.
S
Soren DeHaan
Department of Computer Science, Indiana University, Bloomington
Y
Yuanze Liu
Cognitive Science Program, Indiana University, Bloomington
Johan Bollen
Johan Bollen
Professor of Informatics and Cognitive Science, Indiana University
network sciencecomplex systemscomputational social sciencecognitive sciencebibliometrics
S
Sa'ul A. Blanco
Department of Computer Science, Indiana University, Bloomington