SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the persistent issue that large language models frequently introduce verifiable security vulnerabilities in code generation, even when explicitly instructed to avoid them. To mitigate this, the authors propose SecureForge, the first framework that automatically refines system prompts using vulnerability audit feedback. By identifying benign prompts that inadvertently induce vulnerabilities, SecureForge employs Markov-chain sampling to generate diverse synthetic training data and iteratively optimizes system prompts—achieving zero-shot generalization without relying on real user data. Experimental results demonstrate that this approach reduces security vulnerabilities by up to 48% across mainstream large language models while maintaining or improving unit test pass rates, thereby achieving a Pareto improvement in both security and functional correctness.

📝 Abstract

LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without human involvement. Even when frontier models are explicitly asked to write secure production code with relevant weaknesses to avoid in context, we find that they still produce verifiable vulnerabilities on average 23% of the time across a corpus of 250 benign coding prompts. We introduce SecureForge, an automated pipeline that both audits security risks of frontier models and produces auditing-informed secure system prompts that reduce output security vulnerabilities while maintaining unit test performance. SecureForge first identifies benign prompts that produce statically detectable vulnerabilities, and then amplifies them into a large synthetic prompt corpus of diverse scenarios using a Markovian sampling technique to jointly maintain error rates and prompt diversity. This corpus is then used to iteratively optimize the system prompts to reduce output security vulnerabilities. On frontier models, SecureForge yields a statistically significant Pareto improvement in both unit test success and output security, with output vulnerabilities reduced by up to 48%. The resulting system prompts transfer zero-shot to in-the-wild coding agent prompts, without any exposure to real user prompt distributions during optimization.

Problem

Research questions and friction points this paper is trying to address.

LLM-generated code

cybersecurity vulnerabilities

secure coding

code security

vulnerability detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt optimization

LLM security

vulnerability detection