Explainable Concept Generation through Vision-Language Preference Learning

📅 2024-08-24
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of manual concept image set construction and frequent omission of critical concepts in neural network concept-level interpretability, this paper proposes a text-guided automatic concept image generation framework. The method formulates concept generation as a preference-driven reinforcement learning task (RLPO), integrating CLIP/ViT with diffusion models to achieve semantically aligned, high-fidelity image synthesis, while incorporating human preference modeling to enhance the interpretability of abstract concepts. It enables end-to-end translation of ambiguous textual descriptions into high-quality, class-specific concept images—eliminating the need for manual curation. Evaluated on multiple benchmarks, the approach significantly improves concept relevance and explanation quality. Furthermore, it demonstrates practical utility in model bias diagnosis and decision rationale analysis, validating its effectiveness and applicability as an automated tool for neural network interpretation.

Technology Category

Application Category

📝 Abstract
Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual"concepts"that are not directly related to feature attributes. For instance, the concept of"stripes"is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and collect multiple candidate concept image sets, which can often be imprecise and labor-intensive. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization (RLPO) algorithm that fine-tunes the vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate the capability of our method to articulate complex and abstract concepts which aligns with the test class that are otherwise challenging to craft manually. In addition to showing the efficacy and reliability of our method, we show how our method can be used as a diagnostic tool for analyzing neural networks.
Problem

Research questions and friction points this paper is trying to address.

Automating labor-intensive manual concept image set creation
Generating meaningful concepts via vision-language preference learning
Improving explainability of neural networks' internal representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates concept images via vision-language model
Uses RLPO for meaningful concept optimization
Automates labor-intensive manual concept collection
🔎 Similar Papers
No similar papers found.