From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing text-to-image (T2I) robustness evaluations suffer from limited adversarial prompt scale, insufficient cultural diversity, and low realism. To address these issues, this paper proposes Seed2Harvest—a human-AI collaborative framework that leverages manually crafted, high-quality, culturally diverse adversarial prompts as seeds and augments them via AI-driven, semantics-preserving data expansion. This enables scalable, high-fidelity prompt generation while preserving human-designed attack patterns. The expanded prompt set achieves broad geographic coverage (535 unique regions) and high information entropy (7.48). When evaluated on mainstream safety detectors—NudeNet, SD NSFW, and Q16—the augmented prompt set maintains stable attack success rates (0.31, 0.36, and 0.12, respectively). Seed2Harvest thus establishes the first benchmark for multimodal red-teaming that simultaneously satisfies scalability, cultural diversity, and real-world relevance.

Technology Category

Application Category

📝 Abstract

Text-to-image (T2I) models have become prevalent across numerous applications, making their robust evaluation against adversarial attacks a critical priority. Continuous access to new and challenging adversarial prompts across diverse domains is essential for stress-testing these models for resilience against novel attacks from multiple vectors. Current techniques for generating such prompts are either entirely authored by humans or synthetically generated. On the one hand, datasets of human-crafted adversarial prompts are often too small in size and imbalanced in their cultural and contextual representation. On the other hand, datasets of synthetically-generated prompts achieve scale, but typically lack the realistic nuances and creative adversarial strategies found in human-crafted prompts. To combine the strengths of both human and machine approaches, we propose Seed2Harvest, a hybrid red-teaming method for guided expansion of culturally diverse, human-crafted adversarial prompt seeds. The resulting prompts preserve the characteristics and attack patterns of human prompts while maintaining comparable average attack success rates (0.31 NudeNet, 0.36 SD NSFW, 0.12 Q16). Our expanded dataset achieves substantially higher diversity with 535 unique geographic locations and a Shannon entropy of 7.48, compared to 58 locations and 5.28 entropy in the original dataset. Our work demonstrates the importance of human-machine collaboration in leveraging human creativity and machine computational capacity to achieve comprehensive, scalable red-teaming for continuous T2I model safety evaluation.

Problem

Research questions and friction points this paper is trying to address.

Evaluating T2I models against diverse adversarial attacks

Balancing human creativity and synthetic scale in prompt generation

Ensuring cultural diversity and realism in adversarial prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid method combining human and AI creativity

Expands culturally diverse adversarial prompt seeds

Balances human nuance with machine scalability

🔎 Similar Papers

Unified Text-to-Image Generation and Retrieval