🤖 AI Summary
This work addresses the scalability limitations of concept-based explainable AI (XAI), which typically relies on labor-intensive manual annotation of images. For the first time, it systematically evaluates the feasibility of using zero-shot text-to-image generative models to construct synthetic concept datasets. Through four complementary analyses—concept representation similarity, intra-concept consistency, concept ablation, and performance on downstream explanation tasks—the study assesses the fidelity and explanatory validity of synthetic concepts relative to real ones. Findings indicate that while current text-to-image models can efficiently generate concept exemplars, they still exhibit limitations in faithfulness and explanatory consistency. The authors release the first open-source synthetic concept dataset, offering a promising new pathway toward scalable XAI.
📝 Abstract
Concept-based Explainable Artificial Intelligence (XAI) interprets deep learning models using human-understandable visual features (e.g., textures or object parts) by linking internal representations to class predictions, thereby bridging the gap between low-level image data and high-level semantics. A major challenge, however, is the reliance on large sets of labeled images to represent each concept, which limits scalability. In this work, we investigate the use of zero-shot Text-to-Image (T2I) generative models as a source of synthetic concept datasets for concept-based XAI methods. Specifically, we generate concepts using predefined prompts and evaluate their faithfulness to real ones through four complementary analyses: (1) comparing synthetic vs. real concept images via concept representation similarity; (2) evaluating their intra-similarity by comparing pairs of subsets of the same concept with progressively increasing size; (3) evaluating their performance for downstream explanation tasks using relevant class images; (4) evaluating how removing a concept from tested class images affects explanations of generated concepts. While current T2I generative models promise a shortcut to concept-based XAI, our study highlights challenges and raises open questions about the use of synthetic data generated by zero-shot pipelines in model analyses. The resulting dataset is available at https://github.com/DataSciencePolimi/ZeroShot-T2I-Concepts.