🤖 AI Summary
This study investigates the real-world contributions of autonomous code-generating agents in open-source software projects and their impact on code quality and maintainability. Leveraging a novel dataset comprising approximately 110,000 pull requests, the work presents the first large-scale longitudinal analysis comparing five prominent agent types by tracking the merge outcomes, developer interactions, and long-term evolution of agent-generated code. The findings reveal that, despite steadily increasing agent involvement, code produced by these agents exhibits significantly lower retention stability compared to human-authored code, being more frequently modified or removed in subsequent commits. This highlights a critical challenge regarding the long-term maintenance burden introduced by current AI coding agents, suggesting that their integration into collaborative software development may entail hidden sustainability costs.
📝 Abstract
The rise of large language models for code has reshaped software development. Autonomous coding agents, able to create branches, open pull requests, and perform code reviews, now actively contribute to real-world projects. Their growing role offers a unique and timely opportunity to investigate AI-driven contributions and their effects on code quality, team dynamics, and software maintainability. In this work, we construct a novel dataset of approximately $110,000$ open-source pull requests, including associated commits, comments, reviews, issues, and file changes, collectively representing millions of lines of source code. We compare five popular coding agents, including OpenAI Codex, Claude Code, GitHub Copilot, Google Jules, and Devin, examining how their usage differs in various development aspects such as merge frequency, edited file types, and developer interaction signals, including comments and reviews. Furthermore, we emphasize that code authoring and review are only a small part of the larger software engineering process, as the resulting code must also be maintained and updated over time. Hence, we offer several longitudinal estimates of survival and churn rates for agent-generated versus human-authored code. Ultimately, our findings indicate an increasing agent activity in open-source projects, although their contributions are associated with more churn over time compared to human-authored code.