🤖 AI Summary
This work proposes a self-repair framework for code generated by large language models, which frequently suffers from security vulnerabilities, logical errors, and compilation failures. The approach integrates fine-grained feedback from multiple tools—including compiler diagnostics, CodeQL static analysis, and KLEE symbolic execution—with retrieval-augmented generation leveraging a lightweight semantic embedding model to retrieve historically successful repair examples. This synergy guides the model to iteratively refine its outputs. The method substantially enhances the model’s ability to autonomously fix security flaws, particularly excelling with large models that are difficult to fine-tune. Experimental results show a 96% reduction in vulnerabilities for DeepSeek-Coder-1.3B and a drop in critical security defect rates from 58.55% to 22.19% for CodeLlama-7B.
📝 Abstract
Large Language Models (LLMs) can generate code but often introduce security vulnerabilities, logical inconsistencies, and compilation errors. Prior work demonstrates that LLMs benefit substantially from structured feedback, static analysis, retrieval augmentation, and execution-based refinement. We propose a retrieval-augmented, multi-tool repair workflow in which a single code-generating LLM iteratively refines its outputs using compiler diagnostics, CodeQL security scanning, and KLEE symbolic execution. A lightweight embedding model is used for semantic retrieval of previously successful repairs, providing security-focused examples that guide generation. Evaluated on a combined dataset of 3,242 programs generated by DeepSeek-Coder-1.3B and CodeLlama-7B, the system demonstrates significant improvements in robustness. For DeepSeek, security vulnerabilities were reduced by 96%. For the larger CodeLlama model, the critical security defect rate was decreased from 58.55% to 22.19%, highlighting the efficacy of tool-assisted self-repair even on"stubborn"models.