Personalized Safety Alignment for Text-to-Image Diffusion Models

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Current text-to-image diffusion models employ uniform safety policies, failing to accommodate individual user differences in age, mental health, religious beliefs, and other sensitive attributes. To address this, we propose PSA (Personalized Safety Alignment), the first framework for user-level safety alignment. PSA models user safety preferences as structured profiles and introduces Sage—the first benchmark dataset of user-specific safety preference annotations. We design a cross-attention-based safety configuration embedding mechanism that dynamically modulates diffusion process behavior conditioned on user profiles. Additionally, we propose a customized preference-aware training strategy to jointly optimize safety and fidelity. Extensive experiments demonstrate that PSA significantly outperforms baseline methods in suppressing harmful content—achieving substantial gains in Win Rate (+24.7%) and Pass Rate (+18.3%)—while preserving image quality. The code, Sage dataset, and trained models are publicly released.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism. Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores. Our code, data, and models are publicly available at https://torpedo2648.github.io/PSAlign/.

Problem

Research questions and friction points this paper is trying to address.

Uniform safety standards ignore individual user preferences

Diverse safety boundaries vary by age, mental health, beliefs

Need personalized control over safety in image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Personalized Safety Alignment for user control

Integrates user profiles via cross-attention mechanism

Sage dataset captures individual safety preferences

🔎 Similar Papers

SteerDiff: Steering towards Safe Text-to-Image Diffusion Models