HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

📅 2024-09-24

🏛️ arXiv.org

📈 Citations: 19

✨ Influential: 1

career value

226K/year

🤖 AI Summary

AI agents face significant safety risks in complex socio-interactive settings—particularly under adversarial user inputs. Method: This paper introduces a modular sandbox ecosystem enabling multi-turn, multi-tool, cross-domain (e.g., healthcare, finance) human-agent interaction simulation. It proposes a novel four-dimensional safety evaluation framework—covering operational, content, social, and legal aspects—and integrates LLM-driven interaction modeling with tool-calling mechanisms. Contribution/Results: We conduct large-scale stress testing across 92 diverse scenarios (1,840 total trials). Empirical results reveal that mainstream LLMs exhibit safety vulnerabilities in over 50% of scenarios, with risk severity markedly amplified under malicious user conditions. The open-source, extensible platform supports customizable scenario construction and standardized agent safety benchmarking, advancing the field toward rigorous, reproducible AI agent safety evaluation.

Technology Category

Application Category

📝 Abstract

AI agents are increasingly autonomous in their interactions with human users and tools, leading to increased interactional safety risks. We present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAICOSYSTEM features a modular sandbox environment that simulates multi-turn interactions between human users and AI agents, where the AI agents are equipped with a variety of tools (e.g., patient management platforms) to navigate diverse scenarios (e.g., a user attempting to access other patients' profiles). To examine the safety of AI agents in these interactions, we develop a comprehensive multi-dimensional evaluation framework that uses metrics covering operational, content-related, societal, and legal risks. Through running 1840 simulations based on 92 scenarios across seven domains (e.g., healthcare, finance, education), we demonstrate that HAICOSYSTEM can emulate realistic user-AI interactions and complex tool use by AI agents. Our experiments show that state-of-the-art LLMs, both proprietary and open-sourced, exhibit safety risks in over 50% cases, with models generally showing higher risks when interacting with simulated malicious users. Our findings highlight the ongoing challenge of building agents that can safely navigate complex interactions, particularly when faced with malicious users. To foster the AI agent safety ecosystem, we release a code platform that allows practitioners to create custom scenarios, simulate interactions, and evaluate the safety and performance of their agents.

Problem

Research questions and friction points this paper is trying to address.

Addressing AI agent safety risks in human-AI interactions

Developing a sandbox framework to simulate multi-turn interactions

Evaluating safety across operational, content, societal, and legal dimensions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular sandbox simulates human-AI multi-turn interactions

Multi-dimensional evaluation framework assesses diverse safety risks

Code platform enables custom scenario creation and agent evaluation

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?