VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing OS agents exhibit excessive execution behavior in untrusted real-world environments, posing significant security risks. To address this, we propose a query-driven human–agent collaboration framework that enables agents to autonomously determine when to initiate human interaction to ensure task reliability. Our approach makes three key contributions: (1) an active triadic interaction mechanism among human, agent, and GUI; (2) a two-stage meta-knowledge learning paradigm that decouples normal execution from risk-triggered querying; and (3) a hybrid architecture integrating multimodal large language models with reinforcement learning to enable dynamic, context-aware query decisions directly over graphical user interfaces. Experiments demonstrate that our method improves step-level success rate by 20.64% in untrusted scenarios while maintaining high efficiency in standard environments. The framework exhibits strong generalization across diverse applications and scalability to new tasks and interfaces.

Technology Category

Application Category

📝 Abstract

With the rapid progress of multimodal large language models, operating system (OS) agents become increasingly capable of automating tasks through on-device graphical user interfaces (GUIs). However, most existing OS agents are designed for idealized settings, whereas real-world environments often present untrustworthy conditions. To mitigate risks of over-execution in such scenarios, we propose a query-driven human-agent-GUI interaction framework that enables OS agents to decide when to query humans for more reliable task completion. Built upon this framework, we introduce VeriOS-Agent, a trustworthy OS agent trained with a two-stage learning paradigm that falicitate the decoupling and utilization of meta-knowledge. Concretely, VeriOS-Agent autonomously executes actions in normal conditions while proactively querying humans in untrustworthy scenarios. Experiments show that VeriOS-Agent improves the average step-wise success rate by 20.64% in untrustworthy scenarios over the state-of-the-art, without compromising normal performance. Analysis highlights VeriOS-Agent's rationality, generalizability, and scalability. The codes, datasets and models are available at https://github.com/Wuzheng02/VeriOS.

Problem

Research questions and friction points this paper is trying to address.

Addresses untrustworthy conditions in OS agent environments

Proposes query-driven human-agent-GUI interaction framework

Enables proactive human queries for reliable task completion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Query-driven human-agent-GUI interaction framework

Two-stage learning paradigm for meta-knowledge

Proactive human querying in untrustworthy scenarios

🔎 Similar Papers

Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents