π€ AI Summary
Current search-based genetic improvement (GI) approaches operate solely at the syntactic level and lack semantic understanding; while large language models (LLMs) possess strong semantic generation capabilities, they suffer from insufficient goal-directed feedback and controllable edit mechanisms. To address these limitations, we propose PatchCatβa novel framework that enables automatic semantic classification of LLM-generated patches into 18 fine-grained patch patterns, supporting NoOp prediction and test skipping. Methodologically, PatchCat integrates semantic clustering, LLM-driven patch editing, and search-guided feedback to jointly ensure semantic awareness and optimization directionality, while remaining compatible with lightweight local models. Experimental results demonstrate that PatchCat significantly reduces test execution overhead, improves GI efficiency and interpretability, and establishes a new paradigm for semantic-aware automated software evolution.
π Abstract
Genetic Improvement (GI) of software automatically creates alternative software versions that are improved according to certain properties of interests (e.g., running-time). Search-based GI excels at navigating large program spaces, but operates primarily at the syntactic level. In contrast, Large Language Models (LLMs) offer semantic-aware edits, yet lack goal-directed feedback and control (which is instead a strength of GI). As such, we propose the investigation of a new research line on AI-powered GI aimed at incorporating semantic aware search. We take a first step at it by augmenting GI with the use of automated clustering of LLM edits. We provide initial empirical evidence that our proposal, dubbed PatchCat, allows us to automatically and effectively categorize LLM-suggested patches. PatchCat identified 18 different types of software patches and categorized newly suggested patches with high accuracy. It also enabled detecting NoOp edits in advance and, prospectively, to skip test suite execution to save resources in many cases. These results, coupled with the fact that PatchCat works with small, local LLMs, are a promising step toward interpretable, efficient, and green GI. We outline a rich agenda of future work and call for the community to join our vision of building a principled understanding of LLM-driven mutations, guiding the GI search process with semantic signals.