AIDev: Studying AI Coding Agents on GitHub

📅 2026-02-09
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the scarcity of large-scale, real-world data on AI-powered coding agents and their impact on software engineering practices. To bridge this gap, we present AIDev, the first large-scale dataset capturing the adoption and collaboration patterns of five prominent AI coding agents—Codex, Devin, Copilot, Cursor, and Claude Code—in authentic development environments. Constructed via GitHub API and multi-source tracing techniques, AIDev comprises 932,791 pull requests generated by these agents across 110,000 repositories and 72,000 developers, along with complete interaction contexts—including comments, reviews, commits, and linked issues—for 33,596 high-star projects. This dataset establishes a foundational empirical resource for investigating AI adoption behaviors, human–AI collaboration dynamics, and productivity effects in software development.

Technology Category

Application Category

📝 Abstract
AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in real-world GitHub repositories. AIDev aggregates 932,791 Agentic-PRs produced by five agents: OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. These PRs span 116,211 repositories and involve 72,189 developers. In addition, AIDev includes a curated subset of 33,596 Agentic-PRs from 2,807 repositories with over 100 stars, providing further information such as comments, reviews, commits, and related issues. This dataset offers a foundation for future research on AI adoption, developer productivity, and human-AI collaboration in the new era of software engineering.>AI Agent, Agentic AI, Coding Agent, Agentic Coding, Agentic Software Engineering, Agentic Engineering
Problem

Research questions and friction points this paper is trying to address.

AI coding agents
real-world usage
software engineering
dataset gap
Agentic-PRs
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI coding agent
Agentic-PR
AIDev dataset
human-AI collaboration
software engineering
🔎 Similar Papers
No similar papers found.