Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Current classifiers struggle to accurately identify academic text generated by large language models (LLMs) and often overlook the heterogeneous effects introduced by different models and prompting strategies. This work proposes an interpretable linear framework that explicitly accounts for variations across models and prompts, enabling a quantitative assessment of LLM influence on authentic academic writing through the analysis of lexical dynamics in arXiv paper titles and abstracts. The study reveals significant shifts in the usage frequencies of specific keywords—such as “beyond” and “via”—and, through large-scale corpora and multi-class classification experiments, demonstrates that LLM adoption has systematically altered academic language patterns. These findings highlight the limitations of existing provenance detection methods and offer a novel perspective on the nuanced impact of LLMs on scholarly writing.

Technology Category

Application Category

📝 Abstract

Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the similarities among different LLMs, experiments show that current classifiers struggle to accurately determine which specific model generated a given text in multi-class classification tasks. Meanwhile, variations across LLMs also result in evolving patterns of word usage in academic papers. By adopting a direct and highly interpretable linear approach and accounting for differences between models and prompts, we quantitatively assess these effects and show that real-world LLM usage is heterogeneous and dynamic.

Problem

Research questions and friction points this paper is trying to address.

large language models

academic writing

word usage shifts

text classification

LLM impact

Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models

word usage patterns

interpretable linear modeling