How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests

📅 2026-01-24

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This study addresses the lack of systematic empirical research comparing AI coding agents and human developers in terms of code change behaviors and the consistency between pull request (PR) descriptions and actual code modifications. Leveraging the AIDev dataset from the MSR 2026 Challenge, this work provides the first large-scale characterization of AI agent contributions in real-world open-source projects, analyzing 24,014 AI-generated PRs against 5,081 human-authored PRs across multiple dimensions—including lines added/deleted, commit frequency, number of modified files, and semantic alignment between PR descriptions and code diffs. Using statistical analysis and semantic similarity metrics, the study finds that AI-generated PRs exhibit significantly higher submission frequency (Cliff’s δ = 0.5429), moderate differences in deleted lines and file scope, and slightly better consistency between textual descriptions and code changes compared to human developers.

Technology Category

Application Category

📝 Abstract

AI coding agents are increasingly acting as autonomous contributors by generating and submitting pull requests (PRs). However, we lack empirical evidence on how these agent-generated PRs differ from human contributions, particularly in how they modify code and describe their changes. Understanding these differences is essential for assessing their reliability and impact on development workflows. Using the MSR 2026 Mining Challenge version of the AIDev dataset, we analyze 24,014 merged Agentic PRs (440,295 commits) and 5,081 merged Human PRs (23,242 commits). We examine additions, deletions, commits, and files touched, and evaluate the consistency between PR descriptions and their diffs using lexical and semantic similarity. Agentic PRs differ substantially from Human PRs in commit count (Cliff's $\delta = 0.5429$) and show moderate differences in files touched and deleted lines. They also exhibit slightly higher description-to-diff similarity across all measures. These findings provide a large-scale empirical characterization of how AI coding agents contribute to open source development.

Problem

Research questions and friction points this paper is trying to address.

AI coding agents

pull requests

code modification

description consistency

empirical study

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI coding agents

pull requests

code modification