Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Large language models (LLMs) generate homogeneous, context-agnostic responses to vulnerable users in high-stakes scenarios, posing personalized safety risks—yet existing safety evaluations rely on context-free metrics and ignore how user background modulates risk. Method: We introduce the “personalized safety” paradigm, proposing PENGUIN—a benchmark of 14,000 diverse scenarios spanning seven sensitive domains—and RAISE, a lightweight, fine-tuning-free, two-stage planning agent that dynamically models user background via selective information acquisition within an average of 2.7 interaction rounds. RAISE integrates planning-driven architecture, context-sensitive risk modeling, and multi-dimensional user attribute filtering, evaluated via scenario-based adversarial testing. Contribution/Results: Incorporating personalized user information boosts average safety scores of mainstream LLMs by 43.2%; RAISE further improves performance by up to 31.6%, demonstrating the efficacy and practicality of minimal, adaptive personalization for safety enhancement.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics - such as factuality, bias, or toxicity - overlooking the fact that the same response may carry divergent risks depending on the user's background or condition. We introduce personalized safety to fill this gap and present PENGUIN - a benchmark comprising 14,000 scenarios across seven sensitive domains with both context-rich and context-free variants. Evaluating six leading LLMs, we demonstrate that personalized user information significantly improves safety scores by 43.2%, confirming the effectiveness of personalization in safety alignment. However, not all context attributes contribute equally to safety enhancement. To address this, we develop RAISE - a training-free, two-stage agent framework that strategically acquires user-specific background. RAISE improves safety scores by up to 31.6% over six vanilla LLMs, while maintaining a low interaction cost of just 2.7 user queries on average. Our findings highlight the importance of selective information gathering in safety-critical domains and offer a practical solution for personalizing LLM responses without model retraining. This work establishes a foundation for safety research that adapts to individual user contexts rather than assuming a universal harm standard.

Problem

Research questions and friction points this paper is trying to address.

LLMs generate uniform responses ignoring user-specific safety risks

Existing safety evaluations overlook context-dependent risk variations

Need personalized safety benchmarks and adaptive response frameworks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces personalized safety benchmark PENGUIN

Develops RAISE agent for strategic user context

Enhances safety without model retraining

🔎 Similar Papers

No similar papers found.