GUARD:Dual-Agent based Backdoor Defense on Chain-of-Thought in Neural Code Generation

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Chain-of-thought (CoT)-enhanced code generation models are vulnerable to backdoor attacks, and existing defenses fail to mitigate them effectively. To address this, we propose GUARD, a dual-agent collaborative defense framework. Its contributions are twofold: (1) GUARD-Judge performs fine-grained anomaly detection by analyzing CoT steps across multiple dimensions and identifying trigger patterns; (2) GUARD-Repair employs retrieval-augmented generation (RAG) combined with adversarial reasoning verification to semantically consistent repair of suspicious reasoning steps. Evaluated on multiple code generation benchmarks, GUARD reduces backdoor attack success rates to <1.2% while incurring only a 0.8% drop in Pass@1—significantly outperforming state-of-the-art defenses. To our knowledge, GUARD is the first approach to jointly guarantee both high security against backdoors and high code generation quality.

Technology Category

Application Category

📝 Abstract

With the widespread application of large language models in code generation, recent studies demonstrate that employing additional Chain-of-Thought generation models can significantly enhance code generation performance by providing explicit reasoning steps. However, as external components, CoT models are particularly vulnerable to backdoor attacks, which existing defense mechanisms often fail to detect effectively. To address this challenge, we propose GUARD, a novel dual-agent defense framework specifically designed to counter CoT backdoor attacks in neural code generation. GUARD integrates two core components: GUARD-Judge, which identifies suspicious CoT steps and potential triggers through comprehensive analysis, and GUARD-Repair, which employs a retrieval-augmented generation approach to regenerate secure CoT steps for identified anomalies. Experimental results show that GUARD effectively mitigates attacks while maintaining generation quality, advancing secure code generation systems.

Problem

Research questions and friction points this paper is trying to address.

Defending against backdoor attacks in Chain-of-Thought code generation

Detecting and repairing vulnerable CoT steps in neural code generation

Ensuring secure code generation without compromising performance quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-agent framework for CoT backdoor defense

GUARD-Judge detects suspicious steps and triggers

GUARD-Repair regenerates secure CoT steps

🔎 Similar Papers

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning