code_transformed: The Influence of Large Language Models on Code

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study investigates whether and how large language models (LLMs) systematically reshape real-world programming styles. Method: Leveraging temporal code data from over 19,000 GitHub repositories, we integrate static analysis, cross-repository style comparison, time-series modeling, and LLM inference tracing to quantify evolutionary trends in naming conventions (e.g., snake_case), complexity, maintainability, and code similarity. Contribution/Results: We present the first large-scale empirical evidence that LLMs have measurably influenced professional coding practices: the prevalence of snake_case variable names in Python rose from 47% to 51% between Q1 2023 and Q1 2025. Moreover, statistical patterns in open-source code strongly correlate with stylistic features observed in LLM-generated outputs. These findings establish a critical, interpretable benchmark for understanding AI-driven paradigm shifts in software engineering.

Technology Category

Application Category

📝 Abstract

Coding remains one of the most fundamental modes of interaction between humans and machines. With the rapid advancement of Large Language Models (LLMs), code generation capabilities have begun to significantly reshape programming practices. This development prompts a central question: Have LLMs transformed code style, and how can such transformation be characterized? In this paper, we present a pioneering study that investigates the impact of LLMs on code style, with a focus on naming conventions, complexity, maintainability, and similarity. By analyzing code from over 19,000 GitHub repositories linked to arXiv papers published between 2020 and 2025, we identify measurable trends in the evolution of coding style that align with characteristics of LLM-generated code. For instance, the proportion of snake_case variable names in Python code increased from 47% in Q1 2023 to 51% in Q1 2025. Furthermore, we investigate how LLMs approach algorithmic problems by examining their reasoning processes. Given the diversity of LLMs and usage scenarios, among other factors, it is difficult or even impossible to precisely estimate the proportion of code generated or assisted by LLMs. Our experimental results provide the first large-scale empirical evidence that LLMs affect real-world programming style.

Problem

Research questions and friction points this paper is trying to address.

LLMs transforming code style and its characterization.

Impact of LLMs on naming conventions and maintainability.

Measurable trends in LLM-influenced coding practices.

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs reshape coding naming conventions significantly.

Measured trends in LLM-influenced code style evolution.

First large-scale study on LLMs' impact on real-world coding.

🔎 Similar Papers

Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation