On Protecting Agentic Systems'Intellectual Property via Watermarking

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Autonomous agent systems are vulnerable to imitation attacks, and existing watermarking methods often fail due to inaccessibility to internal reasoning processes. This work proposes AGENTWM—the first watermarking framework tailored for agent systems—which embeds watermarks into observable behavioral trajectories by injecting imperceptible distributional biases among semantically equivalent tool-call paths. AGENTWM achieves effective intellectual property protection under a gray-box setting, offering both strong concealment and verifiability. Experimental results across three complex tasks demonstrate that AGENTWM attains high detection accuracy while imposing negligible performance overhead on the agent. Furthermore, the watermark resists adaptive removal attempts by adversaries without significantly compromising the model’s utility.

Technology Category

Application Category

📝 Abstract

The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant intellectual property (IP) value. We demonstrate that these systems are highly vulnerable to imitation attacks, where adversaries steal proprietary capabilities by training imitation models on victim outputs. Crucially, existing LLM watermarking techniques fail in this domain because real-world agentic systems often operate as grey boxes, concealing the internal reasoning traces required for verification. This paper presents AGENTWM, the first watermarking framework designed specifically for agentic models. AGENTWM exploits the semantic equivalence of action sequences, injecting watermarks by subtly biasing the distribution of functionally identical tool execution paths. This mechanism allows AGENTWM to embed verifiable signals directly into the visible action trajectory while remaining indistinguishable to users. We develop an automated pipeline to generate robust watermark schemes and a rigorous statistical hypothesis testing procedure for verification. Extensive evaluations across three complex domains demonstrate that AGENTWM achieves high detection accuracy with negligible impact on agent performance. Our results confirm that AGENTWM effectively protects agentic IP against adaptive adversaries, who cannot remove the watermarks without severely degrading the stolen model's utility.

Problem

Research questions and friction points this paper is trying to address.

agentic systems

intellectual property

watermarking

imitation attacks

grey-box

Innovation

Methods, ideas, or system contributions that make the work stand out.

agent watermarking

intellectual property protection

semantic equivalence