Risky-Bench: Probing Agentic Safety Risks under Real-World Deployment

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Current safety evaluation methods for large language model agents offer limited coverage and struggle to address the multidimensional risks inherent in complex, long-term real-world deployments. This work proposes the first scalable, domain-agnostic framework for systematic safety assessment, which derives context-aware safety guidelines from general safety principles. By integrating realistic task simulations, multi-threat hypothesis modeling, and a structured evaluation pipeline, the framework enables comprehensive risk assessment across diverse scenarios and extended interactive sessions. Empirical evaluations in practical settings—such as personal assistant applications—reveal significant safety vulnerabilities in state-of-the-art agents and demonstrate the framework’s adaptability and effectiveness across varied deployment environments.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly deployed as agents that operate in real-world environments, introducing safety risks beyond linguistic harm. Existing agent safety evaluations rely on risk-oriented tasks tailored to specific agent settings, resulting in limited coverage of safety risk space and failing to assess agent safety behavior during long-horizon, interactive task execution in complex real-world deployments. Moreover, their specialization to particular agent settings limits adaptability across diverse agent configurations. To address these limitations, we propose Risky-Bench, a framework that enables systematic agent safety evaluation grounded in real-world deployment. Risky-Bench organizes evaluation around domain-agnostic safety principles to derive context-aware safety rubrics that delineate safety space, and systematically evaluates safety risks across this space through realistic task execution under varying threat assumptions. When applied to life-assist agent settings, Risky-Bench uncovers substantial safety risks in state-of-the-art agents under realistic execution conditions. Moreover, as a well-structured evaluation pipeline, Risky-Bench is not confined to life-assist scenarios and can be adapted to other deployment settings to construct environment-specific safety evaluations, providing an extensible methodology for agent safety assessment.

Problem

Research questions and friction points this paper is trying to address.

agent safety

real-world deployment

safety evaluation

large language models

interactive task execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

agent safety evaluation

real-world deployment

safety risk assessment