Have LLM-associated terms increased in article full texts in all fields?

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic understanding regarding the adoption trends of large language model (LLM)-related terminology across scientific disciplines. Leveraging a corpus of 1.25 million full-text articles published by MDPI between 2021 and 2025, we conduct large-scale text mining to track the frequency dynamics of 80 LLM-associated terms within their full textual contexts. Our analysis reveals a substantial overall increase in the usage of such terminology—reaching up to a 29-fold rise for certain terms—followed by a noticeable decline after 2024. Adoption rates vary significantly by field, with social sciences and electronic engineering showing higher uptake, while life sciences lag behind. This work provides a cross-disciplinary empirical foundation for understanding the impact of AI technologies on scholarly writing practices.
📝 Abstract
The use of Large Language Models (LLMs) like ChatGPT and DeepSeek for translation and language polishing is a welcome development, reducing the longstanding publishing barrier to non-English speakers. Assessing the uptake of this facility is useful to give insights into changing nature of scientific writing. Although the prevalence of LLM-associated terms has been tracked across science in abstracts and for full text biomedical research, their science-wide prevalence in full texts is unknown. In response, this article investigates an expanded set of 80 potentially LLM-associated terms during 2021-2025 in a science-wide full text collection from the publisher MDPI (1.25 million articles), partly focusing on the 73 journals that published at least 500 articles in 2021. The results demonstrate the increasing prevalence of LLM-associated terms science-wide in full texts to 2024, with some terms declining from 2024 to 2025 and others continuing to increase. LLMs seem to avoid some terms (e.g., thus, moreover) and a few terms have stronger associations with abstracts than full texts (e.g., enhanced) or the opposite (e.g., leveraged). The term family "underscore" had the biggest increase: up to 29-fold. There are substantial differences between journals in the apparent use of LLMs for writing, from lower uptake in the life sciences to higher uptake in social sciences, electronic engineering and environmental science. Fields in which there is currently low uptake may need improved or specialist support, such as for reliably translating complex formulae, before the full benefits of automatic translation can be realised.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
scientific writing
LLM-associated terms
full text analysis
cross-disciplinary uptake
Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models
scientific writing
full-text analysis
terminology tracking
cross-disciplinary comparison
🔎 Similar Papers
No similar papers found.
Mike Thelwall
Mike Thelwall
School of Information, Journalism and Communication, The University of Sheffield
scientometricsaltmetricssentiment analysissocial mediaartificial intelligence
K
Kayvan Kousha
Statistical Cybermetrics and Research Evaluation Group, Business School, University of Wolverhampton, UK