🤖 AI Summary
To address the limitations of manual concept image set construction and frequent omission of critical concepts in neural network concept-level interpretability, this paper proposes a text-guided automatic concept image generation framework. The method formulates concept generation as a preference-driven reinforcement learning task (RLPO), integrating CLIP/ViT with diffusion models to achieve semantically aligned, high-fidelity image synthesis, while incorporating human preference modeling to enhance the interpretability of abstract concepts. It enables end-to-end translation of ambiguous textual descriptions into high-quality, class-specific concept images—eliminating the need for manual curation. Evaluated on multiple benchmarks, the approach significantly improves concept relevance and explanation quality. Furthermore, it demonstrates practical utility in model bias diagnosis and decision rationale analysis, validating its effectiveness and applicability as an automated tool for neural network interpretation.
📝 Abstract
Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual"concepts"that are not directly related to feature attributes. For instance, the concept of"stripes"is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and collect multiple candidate concept image sets, which can often be imprecise and labor-intensive. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization (RLPO) algorithm that fine-tunes the vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate the capability of our method to articulate complex and abstract concepts which aligns with the test class that are otherwise challenging to craft manually. In addition to showing the efficacy and reliability of our method, we show how our method can be used as a diagnostic tool for analyzing neural networks.