🤖 AI Summary
Current AI agents struggle to efficiently and accurately understand and interact with user interfaces in visual analysis systems due to their reliance on low-level computer vision techniques and raw DOM access. This work formally defines, for the first time, the state, interaction, and execution semantics required by agents in visual analysis scenarios and introduces a general, extensible “agent-ready” context protocol. By structurally exposing application states and action interfaces—augmented with a visual grammar and compatibility libraries for mainstream web frameworks—the protocol facilitates both the enhancement of existing systems and the development of new ones. Experimental results demonstrate that agents leveraging this protocol achieve significantly higher task success rates while simultaneously reducing token consumption and response latency on representative tasks.
📝 Abstract
The rise of AI agents introduces a fundamental shift in Visual Analytics (VA), in which agents act as a new user group. Current agentic approaches - based on computer vision and raw DOM access - fail to perform VA tasks accurately and efficiently. This paper introduces the Visual Analytics Context Protocol (VACP), a framework designed to make VA applications "agent-ready" that extends generic protocols by explicitly exposing application state, available interactions, and mechanisms for direct execution. To support our context protocol, we contribute a formal specification of AI agent requirements and knowledge representations in VA interfaces. We instantiate VACP as a library compatible with major visualization grammars and web frameworks, enabling augmentation of existing systems and the development of new ones. Our evaluation across representative VA tasks demonstrates that VACP-enabled agents achieve higher success rates in interface interpretation and execution compared to current agentic approaches, while reducing token consumption and latency. VACP closes the gap between human-centric VA interfaces and machine perceivability, ensuring agents can reliably act as collaborative users in VA systems.