🤖 AI Summary
Current text-to-image diffusion models employ uniform safety policies, failing to accommodate individual user differences in age, mental health, religious beliefs, and other sensitive attributes. To address this, we propose PSA (Personalized Safety Alignment), the first framework for user-level safety alignment. PSA models user safety preferences as structured profiles and introduces Sage—the first benchmark dataset of user-specific safety preference annotations. We design a cross-attention-based safety configuration embedding mechanism that dynamically modulates diffusion process behavior conditioned on user profiles. Additionally, we propose a customized preference-aware training strategy to jointly optimize safety and fidelity. Extensive experiments demonstrate that PSA significantly outperforms baseline methods in suppressing harmful content—achieving substantial gains in Win Rate (+24.7%) and Pass Rate (+18.3%)—while preserving image quality. The code, Sage dataset, and trained models are publicly released.
📝 Abstract
Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism. Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores. Our code, data, and models are publicly available at https://torpedo2648.github.io/PSAlign/.