🤖 AI Summary
This study quantifies the real-world penetration of large language models (LLMs) in biomedical academic writing. Method: Leveraging over 15 million PubMed abstracts published between 2010 and 2024, we propose the first unsupervised, large-scale LLM detection framework grounded in lexical anomaly distribution—identifying “overused words” characteristic of LLM-generated text to enable objective, cross-temporal, cross-disciplinary, and cross-regional assessment. Contribution/Results: LLM influence on scholarly writing now exceeds that of major historical events such as the COVID-19 pandemic; by 2024, at least 13.5% of biomedical abstracts show evidence of LLM involvement, rising to 40% in select subfields. This work provides the first empirical evidence establishing LLMs as the most salient external factor currently shaping scientific writing—delivering a critical benchmark for publishing ethics, quality assurance, and science policy formulation.
📝 Abstract
Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists use them for their scholarly writing. But how wide-spread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: we study vocabulary changes in over 15 million biomedical abstracts from 2010--2024 indexed by PubMed, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the Covid pandemic.