Is The Watermarking Of LLM-Generated Code Robust?

📅 2024-03-24

📈 Citations: 1

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing text watermarking methods exhibit severely degraded robustness against semantic-preserving transformations—such as variable renaming and dead-code insertion—when applied to large language model (LLM)-generated code, revealing significantly weaker resilience in code compared to natural language. Method: We propose an AST-based randomized semantic-preserving transformation framework for systematic, reproducible, and fine-grained evaluation of watermark detectability. Contribution/Results: Experiments show that even mild transformations reduce the true positive rate (TPR) of mainstream watermarking methods on Python code to below 50%, demonstrating inadequate practical protection. This work establishes the first standardized benchmark for code watermarking evaluation and provides critical failure insights, thereby motivating the development of programming-language-specific watermarking mechanisms.

Technology Category

Application Category

📝 Abstract

We present the first in depth study on the robustness of existing watermarking techniques applied to code generated by large language models (LLMs). As LLMs increasingly contribute to software development, watermarking has emerged as a potential solution for detecting AI generated code and mitigating misuse, such as plagiarism or the automated generation of malicious programs. While previous research has demonstrated the resilience of watermarking in the text setting, our work reveals that watermarking techniques are significantly more fragile in code-based contexts. Specifically, we show that simple semantic-preserving transformations, such as variable renaming and dead code insertion, can effectively erase watermarks without altering the program's functionality. To systematically evaluate watermark robustness, we develop an algorithm that traverses the Abstract Syntax Tree (AST) of a watermarked program and applies a sequence of randomized, semantics-preserving transformations. Our experimental results, conducted on Python code generated by different LLMs, indicate that even minor modifications can drastically reduce watermark detectability, with true positive rates (TPR) dropping below 50% in many cases. Our code is publicly available at https://github.com/uiuc-arc/llm-code-watermark.

Problem

Research questions and friction points this paper is trying to address.

Evaluates robustness of watermarking in LLM-generated code

Identifies vulnerabilities to semantic-preserving code transformations

Develops AST-based algorithm to test watermark resilience

Innovation

Methods, ideas, or system contributions that make the work stand out.

AST traversal algorithm

semantics-preserving transformations

robustness evaluation framework

🔎 Similar Papers

MCGMark: An Encodable and Robust Online Watermark for Tracing LLM-Generated Malicious Code