The Impact of Large Language Models (LLMs) on Code Review Process

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Empirical evidence on how large language models (LLMs), particularly GPT, impact real-world GitHub code review processes—especially across distinct procedural stages—remains scarce. Method: This study introduces a systematic PR identification approach integrating keyword detection, regex-based filtering, and manual validation. Using multivariate linear regression and Mann-Whitney U tests, it quantifies GPT’s effect on stage-specific review durations. Contribution/Results: GPT-assisted pull requests exhibit a >60% reduction in median resolution time, a 33% decrease in review-phase duration, and an 87% reduction in pre-acceptance waiting time. Three characteristic developer usage patterns are identified. This work is the first to provide stage-wise empirical evaluation of LLMs in authentic code review settings, bridging a critical gap in understanding AI’s role in software engineering practice and offering actionable, data-driven insights for AI-augmented development workflows.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have recently gained prominence in the field of software development, significantly boosting productivity and simplifying teamwork. Although prior studies have examined task-specific applications, the phase-specific effects of LLM assistance on the efficiency of code review processes remain underexplored. This research investigates the effect of GPT on GitHub pull request (PR) workflows, with a focus on reducing resolution time, optimizing phase-specific performance, and assisting developers. We curated a dataset of 25,473 PRs from 9,254 GitHub projects and identified GPT-assisted PRs using a semi-automated heuristic approach that combines keyword-based detection, regular expression filtering, and manual verification until achieving 95% labeling accuracy. We then applied statistical modeling, including multiple linear regression and Mann-Whitney U test, to evaluate differences between GPT-assisted and non-assisted PRs, both at the overall resolution level and across distinct review phases. Our research has revealed that early adoption of GPT can substantially boost the effectiveness of the PR process, leading to considerable time savings at various stages. Our findings suggest that GPT-assisted PRs reduced median resolution time by more than 60% (9 hours compared to 23 hours for non-assisted PRs). We discovered that utilizing GPT can reduce the review time by 33% and the waiting time before acceptance by 87%. Analyzing a sample dataset of 300 GPT-assisted PRs, we discovered that developers predominantly use GPT for code optimization (60%), bug fixing (26%), and documentation updates (12%). This research sheds light on the impact of the GPT model on the code review process, offering actionable insights for software teams seeking to enhance workflows and promote seamless collaboration.

Problem

Research questions and friction points this paper is trying to address.

Investigates GPT's effect on GitHub PR workflows

Measures time reduction in code review phases

Analyzes developer usage patterns of GPT assistance

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-assisted GitHub pull request workflows

Semi-automated heuristic PR labeling

Statistical modeling for PR efficiency

🔎 Similar Papers

An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors