BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing research on security risks in code-generation large language models (CodeGen LLMs) predominantly relies on red-teaming, lacking automated, semantically aware blue-team defense mechanisms. Method: We propose BlueCodeAgent—the first end-to-end security defense framework integrating collaborative red-team and blue-team automation, combining constitutional rule reasoning, fine-grained code semantic analysis, dynamic execution verification, and multi-agent coordination to enable context-aware, multi-level detection of harmful instructions, biased content, and vulnerable code. Contribution/Results: BlueCodeAgent significantly reduces false positives, generates actionable security guidelines, and improves generalization to both known and unknown threats. Evaluated across three security tasks on four benchmark datasets, it achieves an average 12.7% F1-score improvement over baselines—particularly mitigating the over-conservatism of base models in vulnerability detection and outperforming safety-prompting approaches.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) are increasingly used for code generation, concerns over the security risks have grown substantially. Early research has primarily focused on red teaming, which aims to uncover and evaluate vulnerabilities and risks of CodeGen models. However, progress on the blue teaming side remains limited, as developing defense requires effective semantic understanding to differentiate the unsafe from the safe. To fill in this gap, we propose BlueCodeAgent, an end-to-end blue teaming agent enabled by automated red teaming. Our framework integrates both sides: red teaming generates diverse risky instances, while the blue teaming agent leverages these to detect previously seen and unseen risk scenarios through constitution and code analysis with agentic integration for multi-level defense. Our evaluation across three representative code-related tasks--bias instruction detection, malicious instruction detection, and vulnerable code detection--shows that BlueCodeAgent achieves significant gains over the base models and safety prompt-based defenses. In particular, for vulnerable code detection tasks, BlueCodeAgent integrates dynamic analysis to effectively reduce false positives, a challenging problem as base models tend to be over-conservative, misclassifying safe code as unsafe. Overall, BlueCodeAgent achieves an average 12.7% F1 score improvement across four datasets in three tasks, attributed to its ability to summarize actionable constitutions that enhance context-aware risk detection. We demonstrate that the red teaming benefits the blue teaming by continuously identifying new vulnerabilities to enhance defense performance.

Problem

Research questions and friction points this paper is trying to address.

Developing blue teaming defense for secure AI code generation

Detecting diverse risk scenarios through automated red teaming

Reducing false positives in vulnerable code detection tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated red teaming generates diverse risky instances

Blue teaming agent detects risks via constitution and code analysis

Integrates dynamic analysis to reduce false positives in detection

🔎 Similar Papers

No similar papers found.

Uber

For New York, NY-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For San Francisco, CA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For Seattle, WA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For Sunnyvale, CA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year.

New York, NY, USA / San Francisco, CA, USA / Seattle, WA, USA

Senior Software Engineer - AI for Security, Data/Application

ByteDance

圣何塞

Authors to Follow