🤖 AI Summary
Current AI-generated text detection paradigms overlook the dynamic, bidirectional influence between human authors and large language models (LLMs) in academic writing. Method: Leveraging a million-word time-series corpus of arXiv abstracts (2007–2024), the authors develop a comparative trend modeling framework and an anomaly fluctuation detection method to analyze lexical evolution. Contribution/Results: The study provides the first empirical evidence of deliberate human curation—i.e., selective editing and filtering—of LLM outputs, manifested as a sharp decline in LLM-preferred terms (e.g., “delve”) alongside sustained growth in conventional academic terms (e.g., “significant”), yielding a non-monotonic, divergent lexical trajectory. It introduces the “human–AI co-evolutionary lexical shift” metric—a quantifiable, temporally sensitive indicator for assessing LLM impact—thereby fundamentally reframing the detectability landscape for covert human–AI collaborative texts.
📝 Abstract
With a statistical analysis of arXiv paper abstracts, we report a marked drop in the frequency of several words previously identified as overused by ChatGPT, such as"delve", starting soon after they were pointed out in early 2024. The frequency of certain other words favored by ChatGPT, such as"significant", has instead kept increasing. These phenomena suggest that some authors of academic papers have adapted their use of large language models (LLMs), for example, by selecting outputs or applying modifications to the LLM-generated content. Such coevolution and cooperation of humans and LLMs thus introduce additional challenges to the detection of machine-generated text in real-world scenarios. Estimating the impact of LLMs on academic writing by examining word frequency remains feasible, and more attention should be paid to words that were already frequently employed, including those that have decreased in frequency.