SWaRL: Safeguard Code Watermarking via Reinforcement Learning

๐Ÿ“… 2026-01-05
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing watermarking methods for large code models often compromise functional correctness by introducing syntactic or semantic errors, making it difficult to simultaneously ensure watermark detectability and code usability. This work proposes a reinforcement learningโ€“based co-training framework that, for the first time, integrates compiler feedback with a confidential verifier: compiler signals guarantee 100% functional correctness of generated code, while the verifier provides rewards to maintain high watermark detection accuracy. By incorporating LoRA (Low-Rank Adaptation), the approach enables transferable embedding of watermark information during model updates, significantly enhancing robustness against code refactoring and adversarial transformations with minimal computational overhead.

Technology Category

Application Category

๐Ÿ“ Abstract
We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLM owners by embedding unique and verifiable signatures in the generated output. Existing approaches rely on manually crafted transformation rules to preserve watermarked code functionality or manipulate token-generation probabilities at inference time, which are prone to compilation errors. To address these challenges, SWaRL employs a reinforcement learning-based co-training framework that uses compiler feedback for functional correctness and a jointly trained confidential verifier as a reward signal to maintain watermark detectability. Furthermore, SWaRL employs low-rank adaptation (LoRA) during fine-tuning, allowing the learned watermark information to be transferable across model updates. Extensive experiments show that SWaRL achieves higher watermark detection accuracy compared to prior methods while fully maintaining watermarked code functionality. The LoRA-based signature embedding steers the base model to generate and solve code in a watermark-specific manner without significant computational overhead. Moreover, SWaRL exhibits strong resilience against refactoring and adversarial transformation attacks.
Problem

Research questions and friction points this paper is trying to address.

code watermarking
intellectual property protection
functional correctness
compiler errors
watermark detectability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Code Watermarking
Low-Rank Adaptation (LoRA)
Compiler Feedback
Intellectual Property Protection
๐Ÿ”Ž Similar Papers
No similar papers found.