🤖 AI Summary
Existing code watermarking methods for detecting large language model (LLM)-generated code suffer from poor generalizability, high computational overhead, and reliance on white-box access. To address these limitations, we propose ACW—a training-free, black-box, lightweight code watermarking framework. ACW embeds detectable watermarks implicitly during code generation via semantic-preserving and idempotent structured code transformations. It employs a statistically significant detection mechanism to robustly extract watermarks without requiring model internals or retraining. Evaluated across diverse LLMs—including ChatGPT and StarCoder—ACW achieves >98% detection accuracy on generated code and demonstrates strong resilience against common adversarial attacks such as code deletion, modification, and obfuscation. Notably, ACW is the first method to overcome the transferability bottleneck of text-based watermarks in the code domain. It simultaneously delivers zero training cost, broad model agnosticism, and high detection reliability.
📝 Abstract
With the development of large language models, multiple AIs have become available for code generation (such as ChatGPT and StarCoder) and are adopted widely. It is often desirable to know whether a piece of code is generated by AI, and furthermore, which AI is the author. For instance, if a certain version of AI is known to generate vulnerable codes, it is particularly important to know the creator. Watermarking is broadly considered a promising solution and is successfully applied for identifying AI-generated text. However, existing efforts on watermarking AI-generated codes are far from ideal, and pose more challenges than watermarking general text due to limited flexibility and encoding space. In this work, we propose ACW (AI Code Watermarking), a novel method for watermarking AI-generated codes. The key idea of ACW is to selectively apply a set of carefully-designed semantic-preserving, idempotent code transformations, whose presence (or absence) allows us to determine the existence of watermarks. It is efficient as it requires no training or fine-tuning and works in a black-box manner. Our experimental results show that ACW is effective (i.e., achieving high accuracy on detecting AI-generated codes and extracting watermarks) as well as resilient, significantly outperforming existing approaches.