Red Teaming Program Repair Agents: When Correct Patches can Hide Vulnerabilities

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work exposes a critical security blind spot in LLM-driven automated program repair (APR) agents: patches deemed “functionally correct” by standard test suites may silently introduce security vulnerabilities—challenging the conventional “passing tests implies safety” assumption. Method: We propose the first adversarial attack paradigm leveraging malicious GitHub issue descriptions to mislead APR agents, and design SWExploit—a three-stage framework integrating program analysis, adversarial text generation, and iterative optimization. It targets three mainstream APR pipelines across five LLMs. Results: SWExploit achieves up to 0.91 attack success rate—significantly surpassing baselines (<0.20)—demonstrating severe underestimation of security risks by current evaluation methodologies. This is the first study to quantitatively measure the security fragility of APR agents in realistic development settings, establishing a crucial benchmark for evaluating and securing AI-assisted programming systems.

Technology Category

Application Category

📝 Abstract

LLM-based agents are increasingly deployed for software maintenance tasks such as automated program repair (APR). APR agents automatically fetch GitHub issues and use backend LLMs to generate patches that fix the reported bugs. However, existing work primarily focuses on the functional correctness of APR-generated patches, whether they pass hidden or regression tests, while largely ignoring potential security risks. Given the openness of platforms like GitHub, where any user can raise issues and participate in discussions, an important question arises: Can an adversarial user submit a valid issue on GitHub that misleads an LLM-based agent into generating a functionally correct but vulnerable patch? To answer this question, we propose SWExploit, which generates adversarial issue statements designed to make APR agents produce patches that are functionally correct yet vulnerable. SWExploit operates in three main steps: (1) program analysis to identify potential injection points for vulnerable payloads; (2) adversarial issue generation to provide misleading reproduction and error information while preserving the original issue semantics; and (3) iterative refinement of the adversarial issue statements based on the outputs of the APR agents. Empirical evaluation on three agent pipelines and five backend LLMs shows that SWExploit can produce patches that are both functionally correct and vulnerable (the attack success rate on the correct patch could reach 0.91, whereas the baseline ASRs are all below 0.20). Based on our evaluation, we are the first to challenge the traditional assumption that a patch passing all tests is inherently reliable and secure, highlighting critical limitations in the current evaluation paradigm for APR agents.

Problem

Research questions and friction points this paper is trying to address.

Examining security risks in LLM-based program repair agents beyond functional correctness

Testing if adversarial GitHub issues can generate correct but vulnerable patches

Challenging the assumption that passing tests ensures patch reliability and security

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates adversarial issues to mislead repair agents

Identifies injection points for vulnerable payloads via analysis

Iteratively refines adversarial inputs based on agent outputs

🔎 Similar Papers

No similar papers found.