SafePro: Evaluating the Safety of Professional-Level AI Agents

📅 2026-01-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI safety evaluations predominantly focus on everyday tasks and struggle to identify alignment failures and safety risks exhibited by professional-grade AI agents in complex occupational settings. This work proposes SafePro—the first safety evaluation benchmark tailored for professional-level AI agents—leveraging a high-complexity, multi-domain dataset of expert tasks to systematically assess their safety alignment. Through an iterative task construction and review pipeline, we conduct safety behavior evaluations and mitigation experiments using state-of-the-art large language models. Our findings reveal that even advanced models exhibit significant safety vulnerabilities in professional contexts, while the tested mitigation strategies effectively enhance their safety performance. These results demonstrate SafePro’s effectiveness in uncovering novel safety hazards and guiding targeted safety improvements for professional AI systems.

Technology Category

Application Category

📝 Abstract
Large language model-based agents are rapidly evolving from simple conversational assistants into autonomous systems capable of performing complex, professional-level tasks in various domains. While these advancements promise significant productivity gains, they also introduce critical safety risks that remain under-explored. Existing safety evaluations primarily focus on simple, daily assistance tasks, failing to capture the intricate decision-making processes and potential consequences of misaligned behaviors in professional settings. To address this gap, we introduce \textbf{SafePro}, a comprehensive benchmark designed to evaluate the safety alignment of AI agents performing professional activities. SafePro features a dataset of high-complexity tasks across diverse professional domains with safety risks, developed through a rigorous iterative creation and review process. Our evaluation of state-of-the-art AI models reveals significant safety vulnerabilities and uncovers new unsafe behaviors in professional contexts. We further show that these models exhibit both insufficient safety judgment and weak safety alignment when executing complex professional tasks. In addition, we investigate safety mitigation strategies for improving agent safety in these scenarios and observe encouraging improvements. Together, our findings highlight the urgent need for robust safety mechanisms tailored to the next generation of professional AI agents.
Problem

Research questions and friction points this paper is trying to address.

AI safety
professional AI agents
safety evaluation
alignment
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

SafePro
AI safety evaluation
professional AI agents
safety alignment
complex task safety
🔎 Similar Papers
No similar papers found.