🤖 AI Summary
This study investigates the long-term maintainability of large language model–generated code in real-world software projects, focusing on maintenance frequency, human involvement, and types of modifications. Leveraging the AIDev dataset and GitHub repositories, the authors conduct a large-scale empirical analysis of over 1,000 AI-generated and human-written code files across 100 popular repositories, integrating commit histories, change pattern classification, and statistical comparisons. The work presents the first systematic quantification of maintenance characteristics of AI-generated code, revealing that such code undergoes less frequent maintenance and smaller-magnitude changes compared to human-written code. Notably, developers primarily employ AI-generated code for feature extensions rather than bug fixes, challenging the prevailing assumption that AI-generated code requires frequent correction and highlighting its stability and practical utility.
📝 Abstract
LLM-based autonomous coding agents have reshaped software development. While these agents excel at code generation, open questions persist about the long-term maintainability of AI-generated code. This study empirically investigates the maintenance extent, human involvement, and modification types of AI-generated files versus human-authored code. Using the AIDev dataset of AI-generated pull requests and GitHub, we analyzed over 1,000 files and approximately 3,200 changes from 100 popular repositories. Our findings show that: (i) AI-generated files receive less frequent maintenance than human-authored code, with updates affecting only a small fraction of file size; (ii) the most frequent modifications to AI code are feature extensions, whereas human updates focus on bug fixes, and (iii) human developers perform the large majority of this maintenance.