AgentMark: Utility-Preserving Behavioral Watermarking for Agents

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the challenge of watermarking high-level planning behaviors—such as tool selection and subgoal decisions—of large language model agents in multi-step tasks, a capability lacking in existing content watermarking methods. To reconcile intellectual property protection with task utility under black-box deployment, we propose AgentMark, a novel framework that enables recoverable, multi-bit watermark embedding directly at the planning decision layer. By modeling the agent’s behavioral distribution and employing a distribution-preserving conditional sampling strategy, AgentMark supports robust watermark extraction and behavioral attribution in black-box API settings without compromising long-term task performance. Extensive experiments across embodied reasoning, tool-use, and social interaction scenarios demonstrate its high capacity, strong robustness, and consistent utility preservation. The code is publicly released.

Technology Category

Application Category

📝 Abstract

LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying distribution-preserving conditional sampling, enabling deployment under black-box APIs while remaining compatible with action-layer content watermarking. Experiments across embodied, tool-use, and social environments demonstrate practical multi-bit capacity, robust recovery from partial logs, and utility preservation. The code is available at https://github.com/Tooooa/AgentMark.

Problem

Research questions and friction points this paper is trying to address.

behavioral watermarking

LLM-based agents

planning behavior

utility preservation

IP protection

Innovation

Methods, ideas, or system contributions that make the work stand out.

behavioral watermarking

utility preservation

black-box agents