π€ AI Summary
This work addresses the limitation of current large language models in automated vulnerability repair, which often lack mechanisms to accumulate and reuse historical repair experience, leading to repeated errors. To overcome this, the authors propose EvoRepair, a novel framework that introduces an experience-based self-evolution mechanism for the first time. EvoRepair enables an agent to iteratively learn from and apply relevant historical repairs through a βlearn-and-repairβ loop, extract new knowledge from repair trajectories, and dynamically update its experience repository using a quality-aware scoring strategy, thereby achieving continuous, cross-vulnerability knowledge evolution. Evaluated on PATCHEVAL and SEC-bench, EvoRepair achieves repair success rates of 93.47% and 87.00%, respectively (90.46% overall), substantially outperforming existing baselines. Its robustness across models, programming languages, and datasets is further validated on VUL4J.
π Abstract
Large Language Models (LLMs) have shown promise for automated vulnerability repair (AVR), but they still face several limitations, including the lack of intra-vulnerability experience accumulation and the lack of cross-vulnerability experience reuse. As a result, LLMs may repeatedly make similar mistakes during iterative repair and underutilize valuable repair knowledge from historical vulnerabilities. To address these challenges, we propose EvoRepair, the first experience-based self-evolving AVR agent framework that enables LLMs to accumulate, refine, and leverage domain-specific knowledge across long-horizon vulnerability repairs. EvoRepair follows a cyclic learn-and-repair process that retrieves relevant past experiences to guide repair, extracts new experiences from repair trajectories, and updates an experience bank using quality-aware scoring. We evaluate EvoRepair against 12 representative vulnerability repair baselines on PATCHEVAL and SEC-bench using GPT-5-mini. Results show that EvoRepair achieves the best overall performance, reaching 93.47% on PATCHEVAL, 87.00% on SEC-bench, and 90.46% overall. In particular, EvoRepair outperforms latest LLM-based baseline LoopRepair by 39.56% and 33.50% on PATCHEVAL and SEC-bench, respectively, and surpasses IntentFix by 70.86% and 50.50%. Across both benchmarks, EvoRepair also exceeds the recent self-evolving agent Live-SWE-Agent by 6.98% overall. Additional transfer experiments on VUL4J further demonstrate the robustness of EvoRepair across models, programming languages, and datasets. These findings demonstrate that experience-based self-evolution substantially strengthens agentic AVR and goes beyond existing self-evolving techniques.