π€ AI Summary
Static benchmarking inadequately supports systematic diagnosis of failure roots in text-to-image (T2I) models. To address this, we propose an active exploration paradigm and introduce the first scalable, automated framework for discovering and mapping T2I failure modes. Our core methodological innovation lies in modeling failure discovery as a structured search for minimal faulty concepts, enabling diagnosis-prioritized analysis. We further integrate novel acceleration techniques with active exploration strategies to efficiently identify error slicesβe.g., uncovering over 247,000 previously unknown failures in Stable Diffusion 1.5. Crucially, our framework provides the first large-scale empirical evidence linking training data scarcity to model failure, revealing a systematic correlation between data insufficiency and erroneous generations across diverse semantic concepts. This work establishes a foundation for data-aware, interpretable T2I model diagnosis and improvement.
π Abstract
Static benchmarks have provided a valuable foundation for comparing Text-to-Image (T2I) models. However, their passive design offers limited diagnostic power, struggling to uncover the full landscape of systematic failures or isolate their root causes. We argue for a complementary paradigm: active exploration. We introduce FailureAtlas, the first framework designed to autonomously explore and map the vast failure landscape of T2I models at scale. FailureAtlas frames error discovery as a structured search for minimal, failure-inducing concepts. While it is a computationally explosive problem, we make it tractable with novel acceleration techniques. When applied to Stable Diffusion models, our method uncovers hundreds of thousands of previously unknown error slices (over 247,000 in SD1.5 alone) and provides the first large-scale evidence linking these failures to data scarcity in the training set. By providing a principled and scalable engine for deep model auditing, FailureAtlas establishes a new, diagnostic-first methodology to guide the development of more robust generative AI. The code is available at https://github.com/cure-lab/FailureAtlas