CodeGuard: A Generalized and Stealthy Backdoor Watermarking for Generative Code Models

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address the poor generalizability and weak concealment of watermarks in generative code models (GCMs), this paper proposes a universal and stealthy backdoor watermarking method. The approach innovatively leverages attention mechanisms to identify highly sensitive embedding positions, then integrates distributed triggers with homomorphic character substitution—ensuring zero degradation in primary task performance while significantly enhancing cross-task and cross-dataset watermark generalizability and robustness against detection. Experimental results show 100% watermark verification accuracy on both code generation and code summarization tasks; against state-of-the-art automated detectors such as ONION, the highest detection rate is merely 0.078—substantially outperforming existing baselines. The core contributions are: (i) an attention-guided dynamic embedding mechanism that adaptively locates optimal watermark insertion sites, and (ii) a semantics-preserving homomorphic substitution design that maintains functional equivalence and linguistic coherence.

Technology Category

Application Category

📝 Abstract

Generative code models (GCMs) significantly enhance development efficiency through automated code generation and code summarization. However, building and training these models require computational resources and time, necessitating effective digital copyright protection to prevent unauthorized leaks and misuse. Backdoor watermarking, by embedding hidden identifiers, simplifies copyright verification by breaking the model's black-box nature. Current backdoor watermarking techniques face two main challenges: first, limited generalization across different tasks and datasets, causing fluctuating verification rates; second, insufficient stealthiness, as watermarks are easily detected and removed by automated methods. To address these issues, we propose CodeGuard, a novel watermarking method combining attention mechanisms with distributed trigger embedding strategies. Specifically, CodeGuard employs attention mechanisms to identify watermark embedding positions, ensuring verifiability. Moreover, by using homomorphic character replacement, it avoids manual detection, while distributed trigger embedding reduces the likelihood of automated detection. Experimental results demonstrate that CodeGuard achieves up to 100% watermark verification rates in both code summarization and code generation tasks, with no impact on the primary task performance. In terms of stealthiness, CodeGuard performs exceptionally, with a maximum detection rate of only 0.078 against ONION detection methods, significantly lower than baseline methods.

Problem

Research questions and friction points this paper is trying to address.

Ensures copyright protection for generative code models

Improves generalization across tasks and datasets

Enhances stealthiness to prevent watermark detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines attention mechanisms with distributed triggers

Uses homomorphic character replacement for stealth

Achieves high verification rates across tasks

🔎 Similar Papers

Is The Watermarking Of LLM-Generated Code Robust?