Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current classifiers struggle to accurately identify academic text generated by large language models (LLMs) and often overlook the heterogeneous effects introduced by different models and prompting strategies. This work proposes an interpretable linear framework that explicitly accounts for variations across models and prompts, enabling a quantitative assessment of LLM influence on authentic academic writing through the analysis of lexical dynamics in arXiv paper titles and abstracts. The study reveals significant shifts in the usage frequencies of specific keywords—such as “beyond” and “via”—and, through large-scale corpora and multi-class classification experiments, demonstrates that LLM adoption has systematically altered academic language patterns. These findings highlight the limitations of existing provenance detection methods and offer a novel perspective on the nuanced impact of LLMs on scholarly writing.

Technology Category

Application Category

📝 Abstract
Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the similarities among different LLMs, experiments show that current classifiers struggle to accurately determine which specific model generated a given text in multi-class classification tasks. Meanwhile, variations across LLMs also result in evolving patterns of word usage in academic papers. By adopting a direct and highly interpretable linear approach and accounting for differences between models and prompts, we quantitatively assess these effects and show that real-world LLM usage is heterogeneous and dynamic.
Problem

Research questions and friction points this paper is trying to address.

large language models
academic writing
word usage shifts
text classification
LLM impact
Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models
word usage patterns
interpretable linear modeling
LLM heterogeneity
text classification
🔎 Similar Papers
No similar papers found.