SafePred: A Predictive Guardrail for Computer-Using Agents via World Models

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing agent safety mechanisms are predominantly reactive, rendering them ill-suited to address long-term high-risk behaviors whose adverse effects manifest with significant delay. This work proposes SafePred, a predictive safety framework that establishes, for the first time, a closed-loop integration of risk assessment and decision-making. By leveraging a world model for semantic-level long-horizon risk prediction, SafePred embeds risk evaluation directly into the agent’s planning process, enabling both step-level interventions and task-level replanning. The approach transcends the limitations of conventional passive safeguards, achieving over 97.6% safety performance by substantially reducing high-risk behaviors while improving task utility by up to 21.4% compared to baseline methods.

Technology Category

Application Category

📝 Abstract

With the widespread deployment of Computer-using Agents (CUAs) in complex real-world environments, prevalent long-term risks often lead to severe and irreversible consequences. Most existing guardrails for CUAs adopt a reactive approach, constraining agent behavior only within the current observation space. While these guardrails can prevent immediate short-term risks (e.g., clicking on a phishing link), they cannot proactively avoid long-term risks: seemingly reasonable actions can lead to high-risk consequences that emerge with a delay (e.g., cleaning logs leads to future audits being untraceable), which reactive guardrails cannot identify within the current observation space. To address these limitations, we propose a predictive guardrail approach, with the core idea of aligning predicted future risks with current decisions. Based on this approach, we present SafePred, a predictive guardrail framework for CUAs that establishes a risk-to-decision loop to ensure safe agent behavior. SafePred supports two key abilities: (1) Short- and long-term risk prediction: by using safety policies as the basis for risk prediction, SafePred leverages the prediction capability of the world model to generate semantic representations of both short-term and long-term risks, thereby identifying and pruning actions that lead to high-risk states; (2) Decision optimization: translating predicted risks into actionable safe decision guidances through step-level interventions and task-level re-planning. Extensive experiments show that SafePred significantly reduces high-risk behaviors, achieving over 97.6% safety performance and improving task utility by up to 21.4% compared with reactive baselines.

Problem

Research questions and friction points this paper is trying to address.

Computer-using Agents

long-term risks

reactive guardrails

delayed consequences

safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

predictive guardrail

world model

long-term risk prediction