Agentic Refactoring: An Empirical Study of AI Coding Agents

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work lacks empirical understanding of AI coding agents’ (e.g., Codex, Claude Code, Cursor) code refactoring behavior in real-world settings. Method: This paper presents the first large-scale empirical study, analyzing over 10,000 Java open-source commits from the AIDev dataset. We integrate automated code quality metrics with manual classification of coding intents to systematically identify AI-driven refactoring activities and their underlying motivations. Contribution/Results: We find that 26.1% of AI-generated commits explicitly target refactoring—predominantly localized consistency improvements such as variable renaming and type adjustments—motivated primarily by enhanced maintainability and readability. Refactoring significantly improves structural quality: median Class LOC decreases by 15.25%, and cyclomatic complexity is reduced. This study provides the first large-scale, project-based empirical evidence and behavioral characterization of AI-assisted refactoring, advancing sustainable software development through empirically grounded insights.

Technology Category

Application Category

📝 Abstract
Agentic coding tools, such as OpenAI Codex, Claude Code, and Cursor, are transforming the software engineering landscape. These AI-powered systems function as autonomous teammates capable of planning and executing complex development tasks. Agents have become active participants in refactoring, a cornerstone of sustainable software development aimed at improving internal code quality without altering observable behavior. Despite their increasing adoption, there is a critical lack of empirical understanding regarding how agentic refactoring is utilized in practice, how it compares to human-driven refactoring, and what impact it has on code quality. To address this empirical gap, we present a large-scale study of AI agent-generated refactorings in real-world open-source Java projects, analyzing 15,451 refactoring instances across 12,256 pull requests and 14,988 commits derived from the AIDev dataset. Our empirical analysis shows that refactoring is a common and intentional activity in this development paradigm, with agents explicitly targeting refactoring in 26.1% of commits. Analysis of refactoring types reveals that agentic efforts are dominated by low-level, consistency-oriented edits, such as Change Variable Type (11.8%), Rename Parameter (10.4%), and Rename Variable (8.5%), reflecting a preference for localized improvements over the high-level design changes common in human refactoring. Additionally, the motivations behind agentic refactoring focus overwhelmingly on internal quality concerns, with maintainability (52.5%) and readability (28.1%). Furthermore, quantitative evaluation of code quality metrics shows that agentic refactoring yields small but statistically significant improvements in structural metrics, particularly for medium-level changes, reducing class size and complexity (e.g., Class LOC median $Delta$ = -15.25).
Problem

Research questions and friction points this paper is trying to address.

Understanding how AI coding agents perform refactoring in real software projects
Comparing AI-driven refactoring approaches with human-driven refactoring practices
Evaluating the impact of agentic refactoring on internal code quality metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agents autonomously plan and execute refactoring tasks
Focus on low-level consistency edits like variable renaming
Improve code maintainability and reduce complexity metrics
🔎 Similar Papers
No similar papers found.