LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Current search-based genetic improvement (GI) approaches operate solely at the syntactic level and lack semantic understanding; while large language models (LLMs) possess strong semantic generation capabilities, they suffer from insufficient goal-directed feedback and controllable edit mechanisms. To address these limitations, we propose PatchCat—a novel framework that enables automatic semantic classification of LLM-generated patches into 18 fine-grained patch patterns, supporting NoOp prediction and test skipping. Methodologically, PatchCat integrates semantic clustering, LLM-driven patch editing, and search-guided feedback to jointly ensure semantic awareness and optimization directionality, while remaining compatible with lightweight local models. Experimental results demonstrate that PatchCat significantly reduces test execution overhead, improves GI efficiency and interpretability, and establishes a new paradigm for semantic-aware automated software evolution.

Technology Category

Application Category

📝 Abstract

Genetic Improvement (GI) of software automatically creates alternative software versions that are improved according to certain properties of interests (e.g., running-time). Search-based GI excels at navigating large program spaces, but operates primarily at the syntactic level. In contrast, Large Language Models (LLMs) offer semantic-aware edits, yet lack goal-directed feedback and control (which is instead a strength of GI). As such, we propose the investigation of a new research line on AI-powered GI aimed at incorporating semantic aware search. We take a first step at it by augmenting GI with the use of automated clustering of LLM edits. We provide initial empirical evidence that our proposal, dubbed PatchCat, allows us to automatically and effectively categorize LLM-suggested patches. PatchCat identified 18 different types of software patches and categorized newly suggested patches with high accuracy. It also enabled detecting NoOp edits in advance and, prospectively, to skip test suite execution to save resources in many cases. These results, coupled with the fact that PatchCat works with small, local LLMs, are a promising step toward interpretable, efficient, and green GI. We outline a rich agenda of future work and call for the community to join our vision of building a principled understanding of LLM-driven mutations, guiding the GI search process with semantic signals.

Problem

Research questions and friction points this paper is trying to address.

Incorporating semantic awareness into automated software improvement

Categorizing and controlling LLM-generated patches for efficient evolution

Enabling interpretable and resource-saving genetic improvement techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided genetic improvement for software

Automated clustering of LLM edits

Semantic-aware categorization of software patches

🔎 Similar Papers

Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond