🤖 AI Summary
Current website governance mechanisms struggle to effectively regulate the complex interactions of large language model (LLM)-driven web agents, leading sites to over-rely on blocking and CAPTCHAs, which inadvertently hinder beneficial automation. This work proposes a lightweight permission manifest mechanism, agent-permissions.json, extending the principles of robots.txt into the era of LLM agents by providing the first standardized, low-friction framework for expressing and enforcing agent permissions. Formatted as structured JSON and integrated with API references, this mechanism enables agents to automatically parse and comply with site policies, granting website owners fine-grained control. It reduces erroneous blocking while facilitating the deployment of compliant applications such as efficient automation tools, e-commerce services, and accessibility aids.
📝 Abstract
The rise of Large Language Model (LLM)-based web agents represents a significant shift in automated interactions with the web. Unlike traditional crawlers that follow simple conventions, such as robots$.$txt, modern agents engage with websites in sophisticated ways: navigating complex interfaces, extracting structured information, and completing end-to-end tasks. Existing governance mechanisms were not designed for these capabilities. Without a way to specify what interactions are and are not allowed, website owners increasingly rely on blanket blocking and CAPTCHAs, which undermine beneficial applications such as efficient automation, convenient use of e-commerce services, and accessibility tools. We introduce agent-permissions$.$json, a robots$.$txt-style lightweight manifest where websites specify allowed interactions, complemented by API references where available. This framework provides a low-friction coordination mechanism: website owners only need to write a simple JSON file, while agents can easily parse and automatically implement the manifest's provisions. Website owners can then focus on blocking non-compliant agents, rather than agents as a whole. By extending the spirit of robots$.$txt to the era of LLM-mediated interaction, and complementing data use initiatives such as AIPref, the manifest establishes a compliance framework that enables beneficial agent interactions while respecting site owners'preferences.