🤖 AI Summary
This work addresses the inefficiency of current end-user AI agents in human-AI interaction, which stems from the absence of structured feedback mechanisms and high barriers to entry for non-expert users. To overcome these limitations, we propose a visual human-AI collaborative auditing framework tailored for end-user AI agents. The framework enables inspectable, intervenable, and traceable agent operations through a browser-based interface, while supporting remote access and persistent preference memory. Built on a local npm server, the system employs a skill-plugin architecture that integrates a Web UI, HTTP-based communication, execution trajectory visualization, and in-memory editing capabilities. Our approach significantly lowers the usability barrier and enhances both the efficiency of human-AI collaboration and the quality of task execution.
📝 Abstract
Recent autonomous AI agents such as Codex, and Claude Code have made it increasingly practical for users to delegate complex tasks, including writing emails, executing code, issuing shell commands, and carrying out multi-step plans. However, despite these capabilities, human-agent interaction still largely happens through terminal interfaces or remote text-based channels such as Discord. These interaction modes are often inefficient and unfriendly: long text outputs are difficult to read and review, proposed actions lack clear structure and visual context, and users must express feedback by typing detailed corrections, which is cumbersome and often discourages effective collaboration. As a result, non-expert users in particular face a high barrier to working productively with agents. To address this gap, we present AgentClick, an interactive review layer for terminal-based agents. AgentClick is implemented as a localhost npm server paired with a skill-based plugin that connects the running agent to a browser interface, allowing users to supervise and collaborate with agents through a structured web UI rather than raw terminal text alone. The system supports a range of human-in-the-loop workflows, including email drafting and revision, plan review and modification, memory management, trajectory inspection and visualization, and error localization during agent execution. It also turns code generation and execution into a reviewable process, enabling users to inspect and intervene before consequential actions are taken. In addition, AgentClick supports persistent preference capture through editable memory and remote access over HTTP, allowing users to review agents running on servers from their personal devices. Our goal is to lower the barrier for non-expert users and improve the efficiency and quality of human-agent co-work.