Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing visual explanation methods often struggle to simultaneously capture high-level, human-understandable concepts and causal relationships, typically being limited to single concepts or pixel-level features. This work proposes the first framework that integrates high-level concepts with formal abductive and contrastive explanations. By leveraging concept erasure to establish causality, the approach enumerates minimal sets of concepts responsible for a model’s behavior and supports explanations for both individual images and sets of images exhibiting shared behavior. Combining concept bottleneck models with ideas from formal verification, the method employs concept erasure, minimal explanation enumeration, and explanation aggregation to generate concise, causal, and user-friendly concept-level attributions across diverse models, datasets, and user-specified behaviors.

📝 Abstract

*Concept-based explanations* offer a promising approach for explaining the predictions of deep neural networks in terms of high-level, human-understandable concepts. However, existing methods either do not establish a causal connection between the concepts and model predictions or are limited in expressivity and only able to infer causal explanations involving single concepts. At the same time, the parallel line of work on *formal abductive and contrastive explanations* computes the minimal set of input features causally relevant for model outcomes but only considers low-level features such as pixels. Merging these two threads, in this work, we propose the notion of *concept-based abductive and contrastive explanations* that capture the minimal sets of high-level concepts causally relevant for model outcomes. We then present a family of algorithms that enumerate all minimal explanations while using *concept erasure* procedures to establish causal relationships. By appropriately aggregating such explanations, we are not only able to understand model predictions on individual images but also on collections of images where the model exhibits a user-specified, common *behavior*. We evaluate our approach on multiple models, datasets, and behaviors, and demonstrate its effectiveness in computing helpful, user-friendly explanations.

Problem

Research questions and friction points this paper is trying to address.

concept-based explanations

abductive explanations

contrastive explanations

causal reasoning

vision models

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept-based explanation

abductive explanation

contrastive explanation