AgentBay: A Hybrid Interaction Sandbox for Seamless Human-AI Intervention in Agentic Systems

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the insufficient robustness of large language model (LLM)-driven autonomous agents in real-world anomalous scenarios—necessitating human-in-the-loop (HITL) collaboration—this paper proposes a sandboxed execution framework supporting hybrid human-machine takeover. Methodologically, it introduces: (1) an Adaptive Streaming Protocol (ASP) that jointly encodes command instructions and video streams to enable low-latency, high-reliability dynamic task handover over heterogeneous networks; (2) a cross-platform virtualized sandbox environment integrating hybrid command-video transmission, programmable APIs, and manual control interfaces; and (3) lightweight AI integration via the Model Control Protocol (MCP) standard and open-source SDKs. Experimental evaluation demonstrates a 48.3% improvement in task success rate, 50% reduction in bandwidth consumption, and 5.1% decrease in end-to-end latency. Moreover, system stability and user experience are significantly enhanced under weak-network conditions.

Technology Category

Application Category

📝 Abstract

The rapid advancement of Large Language Models (LLMs) is catalyzing a shift towards autonomous AI Agents capable of executing complex, multi-step tasks. However, these agents remain brittle when faced with real-world exceptions, making Human-in-the-Loop (HITL) supervision essential for mission-critical applications. In this paper, we present AgentBay, a novel sandbox service designed from the ground up for hybrid interaction. AgentBay provides secure, isolated execution environments spanning Windows, Linux, Android, Web Browsers, and Code interpreters. Its core contribution is a unified session accessible via a hybrid control interface: An AI agent can interact programmatically via mainstream interfaces (MCP, Open Source SDK), while a human operator can, at any moment, seamlessly take over full manual control. This seamless intervention is enabled by Adaptive Streaming Protocol (ASP). Unlike traditional VNC/RDP, ASP is specifically engineered for this hybrid use case, delivering an ultra-low-latency, smoother user experience that remains resilient even in weak network environments. It achieves this by dynamically blending command-based and video-based streaming, adapting its encoding strategy based on network conditions and the current controller (AI or human). Our evaluation demonstrates strong results in security, performance, and task completion rates. In a benchmark of complex tasks, the AgentBay (Agent + Human) model achieved more than 48% success rate improvement. Furthermore, our ASP protocol reduces bandwidth consumption by up to 50% compared to standard RDP, and in end-to-end latency with around 5% reduction, especially under poor network conditions. We posit that AgentBay provides a foundational primitive for building the next generation of reliable, human-supervised autonomous systems.

Problem

Research questions and friction points this paper is trying to address.

Enables seamless human-AI intervention in autonomous agent systems

Provides a secure sandbox for hybrid interaction across multiple platforms

Reduces latency and bandwidth for reliable control in weak networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid sandbox for seamless human-AI intervention

Adaptive Streaming Protocol for low-latency control

Unified session with programmatic and manual interfaces

🔎 Similar Papers

No similar papers found.