PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This study investigates the real-world impact of large language models (LLMs) on GitHub pull request (PR) decisions, specifically examining developers’ self-reported use of ChatGPT to generate code patches and its causal effect on patch integration or rejection. Method: We introduce SACU (Self-Attested ChatGPT Usage), the first empirical paradigm for identifying LLM-assisted patches via developer self-reporting, and develop PatchTrack—a mixed-methods framework combining qualitative coding with quantitative analysis across 645 AI-generated code snippets from 338 PRs. Contribution/Results: We find a median patch integration rate of only 25%, with highly selective adoption: full acceptance is rare. Primary rejection reasons include requirement scope mismatch, poor maintainability, solution redundancy, and process compliance barriers. This work provides the first systematic, empirically grounded account of LLM-generated code adoption bottlenecks and decision logic in collaborative software development, offering evidence-based insights for optimizing AI-augmented development practices and tool design.

Technology Category

Application Category

📝 Abstract

The rapid adoption of large language models (LLMs) like ChatGPT in software development has introduced new ways for developers to interact with AI, particularly in pull request workflows. While prior research has examined AI-generated code quality, there is limited understanding of how ChatGPT is utilized in real-world pull request decision-making and how its suggestions influence patch integration and rejection. To explore these aspects, we analyze self-admitted ChatGPT usage (SACU), where developers explicitly disclose their reliance on ChatGPT within pull request discussions. Our study examines 338 pull requests (285 merged, 53 closed) across 255 GitHub repositories, containing 645 ChatGPT-generated code snippets and 3,486 patches. We introduce PatchTrack, a classification tool that determines whether ChatGPT-generated patches were applied (PA, 115 cases), not applied (PN, 64 cases), or not suggested (NE, 106 cases). Our findings reveal that full adoption of ChatGPT-generated code is rare, developers frequently modify or selectively integrate AI-generated patches to align with project constraints, with a median integration rate of 25%. Through qualitative analysis, we identify key factors influencing patch integration and pull request rejection, including scope misalignment, maintainability concerns, redundant solutions, and procedural barriers such as incomplete documentation or administrative policies. By providing empirical insights into ChatGPT's role in pull request workflows, this study informs developers, maintainers, and educators on the evolving use of generative AI in collaborative software development. It also lays the groundwork for future research on optimizing AI-assisted development, improving transparency in AI adoption, and enhancing patch integration workflows.

Problem

Research questions and friction points this paper is trying to address.

Analyzes ChatGPT's impact on pull request decision-making and outcomes

Investigates how ChatGPT-generated code influences patch integration and rejection

Identifies key factors affecting AI-generated patch adoption in software development

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes self-admitted ChatGPT usage in pull requests

Introduces PatchTrack for classifying AI-generated patches

Identifies key factors influencing patch integration decisions

🔎 Similar Papers

No similar papers found.