Delving into LLM-assisted writing in biomedical publications through excess vocabulary

📅 2024-06-11

📈 Citations: 16

✨ Influential: 1

career value

199K/year

🤖 AI Summary

This study quantifies the real-world penetration of large language models (LLMs) in biomedical academic writing. Method: Leveraging over 15 million PubMed abstracts published between 2010 and 2024, we propose the first unsupervised, large-scale LLM detection framework grounded in lexical anomaly distribution—identifying “overused words” characteristic of LLM-generated text to enable objective, cross-temporal, cross-disciplinary, and cross-regional assessment. Contribution/Results: LLM influence on scholarly writing now exceeds that of major historical events such as the COVID-19 pandemic; by 2024, at least 13.5% of biomedical abstracts show evidence of LLM involvement, rising to 40% in select subfields. This work provides the first empirical evidence establishing LLMs as the most salient external factor currently shaping scientific writing—delivering a critical benchmark for publishing ethics, quality assurance, and science policy formulation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists use them for their scholarly writing. But how wide-spread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: we study vocabulary changes in over 15 million biomedical abstracts from 2010--2024 indexed by PubMed, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the Covid pandemic.

Problem

Research questions and friction points this paper is trying to address.

Impact of LLMs on biomedical writing

Detection of LLM usage in abstracts

Analysis of vocabulary changes post-LLM introduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-assisted biomedical writing analysis

Excess vocabulary tracking method

Large-scale PubMed abstract study

🔎 Similar Papers

From RAGs to riches: Utilizing large language models to write documents for clinical trials.