🤖 AI Summary
This work identifies a novel security threat—Functionally Correct but Vulnerable (FCV) code patches: patches that pass standard test suites yet introduce real-world security vulnerabilities, exposing a critical blind spot in current code agent evaluation paradigms regarding security assurance.
Method: The authors formally define the FCV threat model and empirically demonstrate its feasibility under a black-box setting, requiring only a single query to generate malicious patches by leveraging known CWE vulnerability patterns.
Contribution/Results: Experiments across 12 LLM-agent combinations on SWE-Bench show successful FCV attacks against state-of-the-art models and frameworks; notably, a 40.7% attack success rate is achieved against CWE-538 vulnerabilities using GPT-5 Mini with OpenHands. This work not only establishes the first rigorous FCV threat model but also provides foundational empirical evidence to motivate and guide the development of security-aware evaluation protocols and defensive mechanisms for code intelligence agents.
📝 Abstract
Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test cases but contain vulnerable code. With our proposed FCV-Attack, which can be deliberately crafted by malicious attackers or implicitly introduced by benign developers, we show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat; across 12 agent-model combinations on SWE-Bench, the attack only requires black-box access and a single query to the code agent to perform the attack. For example, for CWE-538 (information exposure vulnerability), the FCV-Attack attains an attack success rate of $40.7%$ on GPT-5 Mini + OpenHands. Our results reveal an important security threat overlooked by current evaluation paradigms and urge the development of security-aware defenses for code agents.