Static Analysis as a Feedback Loop: Enhancing LLM-Generated Code Beyond Correctness

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing code generation benchmarks (e.g., HumanEval, MBPP) evaluate only functional correctness, neglecting critical quality dimensions such as security, reliability, readability, and maintainability. To address this gap, we propose a static analysis–driven iterative prompting framework that systematically integrates multidimensional quality feedback into LLM-based code generation. Our method employs Bandit and Pylint for static analysis to detect violations across these dimensions; GPT-4o then generates targeted repair prompts based on the analysis results, enabling closed-loop optimization. Experiments demonstrate that after ten iterations, security vulnerabilities decrease by 67%, readability violations drop from 80% to 11%, and reliability warnings decline by 78%. This work advances beyond conventional evaluation paradigms by introducing a scalable, quality-aware methodology for controllable optimization of LLM-generated code—establishing the first systematic approach to enforce non-functional requirements in generative code synthesis.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated impressive capabilities in code generation, achieving high scores on benchmarks such as HumanEval and MBPP. However, these benchmarks primarily assess functional correctness and neglect broader dimensions of code quality, including security, reliability, readability, and maintainability. In this work, we systematically evaluate the ability of LLMs to generate high-quality code across multiple dimensions using the PythonSecurityEval benchmark. We introduce an iterative static analysis-driven prompting algorithm that leverages Bandit and Pylint to identify and resolve code quality issues. Our experiments with GPT-4o show substantial improvements: security issues reduced from >40% to 13%, readability violations from >80% to 11%, and reliability warnings from >50% to 11% within ten iterations. These results demonstrate that LLMs, when guided by static analysis feedback, can significantly enhance code quality beyond functional correctness.
Problem

Research questions and friction points this paper is trying to address.

Improving LLM-generated code quality beyond correctness
Addressing security, reliability, and readability issues in generated code
Reducing static analysis violations through feedback-enhanced generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative static analysis-driven prompting algorithm
Leveraging Bandit and Pylint tools
Resolving code quality issues systematically
🔎 Similar Papers
No similar papers found.