SafePred: A Predictive Guardrail for Computer-Using Agents via World Models

πŸ“… 2026-02-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing agent safety mechanisms are predominantly reactive, rendering them ill-suited to address long-term high-risk behaviors whose adverse effects manifest with significant delay. This work proposes SafePred, a predictive safety framework that establishes, for the first time, a closed-loop integration of risk assessment and decision-making. By leveraging a world model for semantic-level long-horizon risk prediction, SafePred embeds risk evaluation directly into the agent’s planning process, enabling both step-level interventions and task-level replanning. The approach transcends the limitations of conventional passive safeguards, achieving over 97.6% safety performance by substantially reducing high-risk behaviors while improving task utility by up to 21.4% compared to baseline methods.

Technology Category

Application Category

πŸ“ Abstract
With the widespread deployment of Computer-using Agents (CUAs) in complex real-world environments, prevalent long-term risks often lead to severe and irreversible consequences. Most existing guardrails for CUAs adopt a reactive approach, constraining agent behavior only within the current observation space. While these guardrails can prevent immediate short-term risks (e.g., clicking on a phishing link), they cannot proactively avoid long-term risks: seemingly reasonable actions can lead to high-risk consequences that emerge with a delay (e.g., cleaning logs leads to future audits being untraceable), which reactive guardrails cannot identify within the current observation space. To address these limitations, we propose a predictive guardrail approach, with the core idea of aligning predicted future risks with current decisions. Based on this approach, we present SafePred, a predictive guardrail framework for CUAs that establishes a risk-to-decision loop to ensure safe agent behavior. SafePred supports two key abilities: (1) Short- and long-term risk prediction: by using safety policies as the basis for risk prediction, SafePred leverages the prediction capability of the world model to generate semantic representations of both short-term and long-term risks, thereby identifying and pruning actions that lead to high-risk states; (2) Decision optimization: translating predicted risks into actionable safe decision guidances through step-level interventions and task-level re-planning. Extensive experiments show that SafePred significantly reduces high-risk behaviors, achieving over 97.6% safety performance and improving task utility by up to 21.4% compared with reactive baselines.
Problem

Research questions and friction points this paper is trying to address.

Computer-using Agents
long-term risks
reactive guardrails
delayed consequences
safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

predictive guardrail
world model
long-term risk prediction
computer-using agents
decision optimization
πŸ”Ž Similar Papers
No similar papers found.
Yurun Chen
Yurun Chen
Master Student of Science, Tsinghua University
3D vision
Zeyi Liao
Zeyi Liao
The Ohio State University
AINLPMultimodalAgent
P
Ping Yin
Inspur Cloud, China
T
Taotao Xie
Inspur Cloud, China
K
Keting Yin
Zhejiang University, China
S
Shengyu Zhang
Zhejiang University, China