TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This study addresses a critical gap in the safety alignment of text-to-image (T2I) models, which predominantly target explicitly malicious content while overlooking compound semantic risks arising from implicit combinations of benign concepts. The work formally defines the “Multi-Concept Composition Un-safety” (MCCU) problem and introduces TwoHamsters, the first large-scale benchmark comprising 17.5k prompts, to systematically evaluate ten state-of-the-art T2I models and sixteen defense mechanisms. Experimental results reveal that the FLUX model achieves a generation success rate of 99.52% under MCCU conditions, whereas LLaVA-Guard—the current best defense—attains only a 41.06% recall rate, underscoring the severe inadequacy of existing safety paradigms in mitigating composite semantic threats.

Technology Category

Application Category

📝 Abstract
Despite the remarkable synthesis capabilities of text-to-image (T2I) models, safeguarding them against content violations remains a persistent challenge. Existing safety alignments primarily focus on explicit malicious concepts, often overlooking the subtle yet critical risks of compositional semantics. To address this oversight, we identify and formalize a novel vulnerability: Multi-Concept Compositional Unsafety (MCCU), where unsafe semantics stem from the implicit associations of individually benign concepts. Based on this formulation, we introduce TwoHamsters, a comprehensive benchmark comprising 17.5k prompts curated to probe MCCU vulnerabilities. Through a rigorous evaluation of 10 state-of-the-art models and 16 defense mechanisms, our analysis yields 8 pivotal insights. In particular, we demonstrate that current T2I models and defense mechanisms face severe MCCU risks: on TwoHamsters, FLUX achieves an MCCU generation success rate of 99.52%, while LLaVA-Guard only attains a recall of 41.06%, highlighting a critical limitation of the current paradigm for managing hazardous compositional generation.
Problem

Research questions and friction points this paper is trying to address.

Multi-Concept Compositional Unsafety
Text-to-Image Models
Compositional Semantics
Content Safety
Unsafe Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Concept Compositional Unsafety
Text-to-Image Safety
Compositional Semantics
Safety Benchmark
TwoHamsters
🔎 Similar Papers
No similar papers found.
C
Chaoshuo Zhang
School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an, China
Y
Yibo Liang
School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an, China
M
Mengke Tian
School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an, China
Chenhao Lin
Chenhao Lin
Xi'an JiaoTong University
AICVPRML
Zhengyu Zhao
Zhengyu Zhao
Xi'an Jiaotong University, China
Adversarial Machine LearningComputer Vision
L
Le Yang
School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an, China
C
Chong Zhang
School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an, China
Yang Zhang
Yang Zhang
Faculty at CISPA Helmholtz Center for Information Security
Trustworthy Machine LearningAI SafetyMachine Learning SecuritySecurityMemes
Chao Shen
Chao Shen
Chair Professor, Xi'an Jiaotong University
AI SecuritySoftware SecurityControl System