🤖 AI Summary
This study addresses the unresolved question of whether AI agents genuinely enhance code readability during refactoring, a gap marked by the absence of systematic evaluations focused specifically on readability outcomes. Focusing for the first time on readability-oriented refactoring by AI agents, the work analyzes 403 commits from the AIDev dataset containing readability-related keywords. Leveraging large language model–driven AI agents and multidimensional readability metrics—including the Maintainability Index and Cyclomatic Complexity—the study quantifies changes before and after refactoring. Results reveal that AI agents tend to prioritize reducing logical complexity and improving documentation over superficial enhancements such as naming or formatting. Notably, 56.1% of commits decreased the Maintainability Index, and 42.7% increased Cyclomatic Complexity, suggesting that AI-driven refactoring does not consistently improve conventional code quality indicators.
📝 Abstract
Code readability is fundamental to software quality and maintainability. Poor readability extends development time, increases bug-inducing risks, and contributes to technical debt. With the rapid advancement of Large Language Models, AI agent-based approaches have emerged as a promising paradigm for automated refactoring, capable of decomposing complex tasks through autonomous planning and execution. While prior studies have examined refactoring by AI agents, these analyses cover all forms of refactoring, including performance optimization and structural improvement. As a result, the extent to which AI agent-based refactoring specifically improves code readability remains unclear.
This study investigates the impact of AI agent-based refactoring on code readability. We extracted commits containing readability-related keywords from the AIDev dataset and analyzed changes in readability metrics before and after each commit, covering 403 commits evaluated using multiple quantitative metrics. Our results indicate that AI agents primarily target logic complexity (42.4%) and documentation improvements (24.2%) rather than surface-level aspects like naming conventions or formatting. However, contrary to expectations, readability-focused commits often degraded traditional quality metrics: the Maintainability Index decreased in 56.1% of commits, while Cyclomatic Complexity increased in 42.7%.