🤖 AI Summary
This study investigates how large language models (LLMs) influence the core evaluative functions of academic peer review. By comparing linguistic features and assessment dimensions in review reports from top-tier AI conferences before and after the widespread adoption of LLMs—using text complexity analysis, automated annotation, and maximum likelihood estimation—the work provides the first fine-grained evidence of LLMs’ structural impact on review content. Findings reveal that post-LLM reviews are longer, more fluent, and linguistically standardized, with low-confidence reviewers increasingly focusing on abstracts and surface-level clarity. However, attention to deeper evaluative dimensions such as originality and reproducibility has significantly declined, suggesting that LLM use may erode the critical depth essential to rigorous peer review.
📝 Abstract
With the rapid advancement of Large Language Models (LLMs), the academic community has faced unprecedented disruptions, particularly in the realm of academic communication. The primary function of peer review is improving the quality of academic manuscripts, such as clarity, originality and other evaluation aspects. Although prior studies suggest that LLMs are beginning to influence peer review, it remains unclear whether they are altering its core evaluative functions. Moreover, the extent to which LLMs affect the linguistic form, evaluative focus, and recommendation-related signals of peer-review reports has yet to be systematically examined. In this study, we examine the changes in peer review reports for academic articles following the emergence of LLMs, emphasizing variations at fine-grained level. Specifically, we investigate linguistic features such as the length and complexity of words and sentences in review comments, while also automatically annotating the evaluation aspects of individual review sentences. We also use a maximum likelihood estimation method, previously established, to identify review reports that potentially have modified or generated by LLMs. Finally, we assess the impact of evaluation aspects mentioned in LLM-assisted review reports on the informativeness of recommendation for paper decision-making. The results indicate that following the emergence of LLMs, peer review texts have become longer and more fluent, with increased emphasis on summaries and surface-level clarity, as well as more standardized linguistic patterns, particularly reviewers with lower confidence score. At the same time, attention to deeper evaluative dimensions, such as originality, replicability, and nuanced critical reasoning, has declined.