Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Current LLM safety evaluations rely on generic benchmarks, overlooking users’ personalized security requirements and thus failing at the individual level. To address this, we propose U-SAFEBENCH—the first user-specific safety benchmark—introducing and quantifying a user-profile-driven safety evaluation paradigm. It features an adversarial test suite spanning multi-dimensional privacy, ethical, and preference constraints. Empirical evaluation across 18 mainstream LLMs reveals widespread failure to satisfy user-level safety requirements. We further propose a lightweight Chain-of-Thought (CoT)-guided remediation method that explicitly models user profiles and constraints via safety-aware reasoning chains, improving average safety compliance by 23.6%. This work bridges the critical gap in personalized safety assessment and establishes a novel, scalable foundation for trustworthy LLM deployment and governance.

Technology Category

Application Category

📝 Abstract

As the use of large language model (LLM) agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SAFEBENCH, the first benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 18 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety standards, marking a new discovery in this field. To address this vulnerability, we propose a simple remedy based on chain-of-thought, demonstrating its effectiveness in improving user-specific safety. Our benchmark and code are available at https://github.com/yeonjun-in/U-SafeBench.

Problem

Research questions and friction points this paper is trying to address.

Evaluate user-specific safety of LLMs

Introduce U-SAFEBENCH for safety assessment

Propose remedy using chain-of-thought technique

Innovation

Methods, ideas, or system contributions that make the work stand out.

User-specific safety evaluation

Chain-of-thought remedy

U-SAFEBENCH benchmark

🔎 Similar Papers

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

2024-04-08arXiv.orgCitations: 17