๐ค AI Summary
Chain-of-thought (CoT)-enhanced code generation models are vulnerable to backdoor attacks, and existing defenses fail to mitigate them effectively. To address this, we propose GUARD, a dual-agent collaborative defense framework. Its contributions are twofold: (1) GUARD-Judge performs fine-grained anomaly detection by analyzing CoT steps across multiple dimensions and identifying trigger patterns; (2) GUARD-Repair employs retrieval-augmented generation (RAG) combined with adversarial reasoning verification to semantically consistent repair of suspicious reasoning steps. Evaluated on multiple code generation benchmarks, GUARD reduces backdoor attack success rates to <1.2% while incurring only a 0.8% drop in Pass@1โsignificantly outperforming state-of-the-art defenses. To our knowledge, GUARD is the first approach to jointly guarantee both high security against backdoors and high code generation quality.
๐ Abstract
With the widespread application of large language models in code generation, recent studies demonstrate that employing additional Chain-of-Thought generation models can significantly enhance code generation performance by providing explicit reasoning steps. However, as external components, CoT models are particularly vulnerable to backdoor attacks, which existing defense mechanisms often fail to detect effectively. To address this challenge, we propose GUARD, a novel dual-agent defense framework specifically designed to counter CoT backdoor attacks in neural code generation. GUARD integrates two core components: GUARD-Judge, which identifies suspicious CoT steps and potential triggers through comprehensive analysis, and GUARD-Repair, which employs a retrieval-augmented generation approach to regenerate secure CoT steps for identified anomalies. Experimental results show that GUARD effectively mitigates attacks while maintaining generation quality, advancing secure code generation systems.