Refactoring with LLMs: Bridging Human Expertise and Machine Understanding

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Developers often neglect refactoring due to resource constraints and lack of immediate returns, while existing automated tools support only a limited set of refactorings. To address this, we propose a novel paradigm that guides large language models (LLMs) using human best practices—specifically, Fowler’s refactoring catalog—to perform diverse, fine-grained refactorings across 61 distinct types. Our approach innovatively integrates descriptive and rule-based instruction strategies, shifting from rigid pattern matching to goal-oriented, semantics-preserving transformation. We evaluate our method using models including GPT-mini and DeepSeek-V3 on both benchmark datasets and real-world GitHub projects, achieving full-coverage refactoring with high semantic fidelity. Rule-based instructions significantly outperform baselines in complex logical refactorings. This work establishes a systematic, empirically grounded methodology for LLM-driven refactoring that is high-quality, interpretable, and broadly applicable.

Technology Category

Application Category

📝 Abstract
Code refactoring is a fundamental software engineering practice aimed at improving code quality and maintainability. Despite its importance, developers often neglect refactoring due to the significant time, effort, and resources it requires, as well as the lack of immediate functional rewards. Although several automated refactoring tools have been proposed, they remain limited in supporting a broad spectrum of refactoring types. In this study, we explore whether instruction strategies inspired by human best-practice guidelines can enhance the ability of Large Language Models (LLMs) to perform diverse refactoring tasks automatically. Leveraging the instruction-following and code comprehension capabilities of state-of-the-art LLMs (e.g., GPT-mini and DeepSeek-V3), we draw on Martin Fowler's refactoring guidelines to design multiple instruction strategies that encode motivations, procedural steps, and transformation objectives for 61 well-known refactoring types. We evaluate these strategies on benchmark examples and real-world code snippets from GitHub projects. Our results show that instruction designs grounded in Fowler's guidelines enable LLMs to successfully perform all benchmark refactoring types and preserve program semantics in real-world settings, an essential criterion for effective refactoring. Moreover, while descriptive instructions are more interpretable to humans, our results show that rule-based instructions often lead to better performance in specific scenarios. Interestingly, allowing models to focus on the overall goal of refactoring, rather than prescribing a fixed transformation type, can yield even greater improvements in code quality.
Problem

Research questions and friction points this paper is trying to address.

Automating diverse code refactoring tasks using Large Language Models
Addressing limitations of current automated refactoring tools
Improving code quality through human-inspired instruction strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs automate refactoring using human-inspired instructions
Instruction strategies encode Fowler's refactoring guidelines
Rule-based instructions enhance performance in specific scenarios
🔎 Similar Papers
No similar papers found.
Y
Yonnel Chen Kuang Piao
Department of Computer Engineering and Software Engineering, Polytechnique Montreal, Montreal, QC, Canada
J
Jean Carlors Paul
Department of Computer Engineering and Software Engineering, Polytechnique Montreal, Montreal, QC, Canada
Leuson Da Silva
Leuson Da Silva
Postdoctoral Fellow - Polytechnique Montreal
Software EngineeringGenerative AIEmpirical StudiesCode Integration
A
Arghavan Moradi Dakhel
Department of Computer Engineering and Software Engineering, Polytechnique Montreal, Montreal, QC, Canada
Mohammad Hamdaqa
Mohammad Hamdaqa
Associate Professor, Polytechnique Montreal
Software EngineeringSoftware AuditingSoftware AnalyticsAIOpsModel-Driven Engineering
Foutse Khomh
Foutse Khomh
NSERC Arthur B. McDonald Fellow, CRC Tier 1, Canada CIFAR AI Chair, FRQ-IVADO Chair, Full Professor
Software engineeringMachine learning systems engineeringMining software repositoriesReverse