FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Static benchmarking inadequately supports systematic diagnosis of failure roots in text-to-image (T2I) models. To address this, we propose an active exploration paradigm and introduce the first scalable, automated framework for discovering and mapping T2I failure modes. Our core methodological innovation lies in modeling failure discovery as a structured search for minimal faulty concepts, enabling diagnosis-prioritized analysis. We further integrate novel acceleration techniques with active exploration strategies to efficiently identify error slices—e.g., uncovering over 247,000 previously unknown failures in Stable Diffusion 1.5. Crucially, our framework provides the first large-scale empirical evidence linking training data scarcity to model failure, revealing a systematic correlation between data insufficiency and erroneous generations across diverse semantic concepts. This work establishes a foundation for data-aware, interpretable T2I model diagnosis and improvement.

Technology Category

Application Category

📝 Abstract

Static benchmarks have provided a valuable foundation for comparing Text-to-Image (T2I) models. However, their passive design offers limited diagnostic power, struggling to uncover the full landscape of systematic failures or isolate their root causes. We argue for a complementary paradigm: active exploration. We introduce FailureAtlas, the first framework designed to autonomously explore and map the vast failure landscape of T2I models at scale. FailureAtlas frames error discovery as a structured search for minimal, failure-inducing concepts. While it is a computationally explosive problem, we make it tractable with novel acceleration techniques. When applied to Stable Diffusion models, our method uncovers hundreds of thousands of previously unknown error slices (over 247,000 in SD1.5 alone) and provides the first large-scale evidence linking these failures to data scarcity in the training set. By providing a principled and scalable engine for deep model auditing, FailureAtlas establishes a new, diagnostic-first methodology to guide the development of more robust generative AI. The code is available at https://github.com/cure-lab/FailureAtlas

Problem

Research questions and friction points this paper is trying to address.

Mapping systematic failures in Text-to-Image models

Identifying root causes through active exploration

Linking failures to training data scarcity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Active exploration framework for T2I failure analysis

Structured search for minimal failure-inducing concepts

Novel acceleration techniques for tractable error discovery

🔎 Similar Papers

Unveiling AI's Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors