When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work identifies a novel security threat—Functionally Correct but Vulnerable (FCV) code patches: patches that pass standard test suites yet introduce real-world security vulnerabilities, exposing a critical blind spot in current code agent evaluation paradigms regarding security assurance. Method: The authors formally define the FCV threat model and empirically demonstrate its feasibility under a black-box setting, requiring only a single query to generate malicious patches by leveraging known CWE vulnerability patterns. Contribution/Results: Experiments across 12 LLM-agent combinations on SWE-Bench show successful FCV attacks against state-of-the-art models and frameworks; notably, a 40.7% attack success rate is achieved against CWE-538 vulnerabilities using GPT-5 Mini with OpenHands. This work not only establishes the first rigorous FCV threat model but also provides foundational empirical evidence to motivate and guide the development of security-aware evaluation protocols and defensive mechanisms for code intelligence agents.

Technology Category

Application Category

📝 Abstract

Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test cases but contain vulnerable code. With our proposed FCV-Attack, which can be deliberately crafted by malicious attackers or implicitly introduced by benign developers, we show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat; across 12 agent-model combinations on SWE-Bench, the attack only requires black-box access and a single query to the code agent to perform the attack. For example, for CWE-538 (information exposure vulnerability), the FCV-Attack attains an attack success rate of $40.7%$ on GPT-5 Mini + OpenHands. Our results reveal an important security threat overlooked by current evaluation paradigms and urge the development of security-aware defenses for code agents.

Problem

Research questions and friction points this paper is trying to address.

Code agents generate functionally correct but vulnerable patches

Current security evaluations overlook functionally correct vulnerable patches

Attack requires only black-box access and single query to succeed

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces FCV-Attack for generating vulnerable patches

Uses black-box access and single query to code agents

Targets LLMs and agent scaffolds with security flaws

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?