🤖 AI Summary
This work addresses the challenge of watermarking high-level planning behaviors—such as tool selection and subgoal decisions—of large language model agents in multi-step tasks, a capability lacking in existing content watermarking methods. To reconcile intellectual property protection with task utility under black-box deployment, we propose AgentMark, a novel framework that enables recoverable, multi-bit watermark embedding directly at the planning decision layer. By modeling the agent’s behavioral distribution and employing a distribution-preserving conditional sampling strategy, AgentMark supports robust watermark extraction and behavioral attribution in black-box API settings without compromising long-term task performance. Extensive experiments across embodied reasoning, tool-use, and social interaction scenarios demonstrate its high capacity, strong robustness, and consistent utility preservation. The code is publicly released.
📝 Abstract
LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying distribution-preserving conditional sampling, enabling deployment under black-box APIs while remaining compatible with action-layer content watermarking. Experiments across embodied, tool-use, and social environments demonstrate practical multi-bit capacity, robust recovery from partial logs, and utility preservation. The code is available at https://github.com/Tooooa/AgentMark.