Explaining Low Perception Model Competency with High-Competency Counterfactuals

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of attributing low-confidence predictions in image classification models. We propose the first framework that leverages counterfactual image generation to explain “perceptual incapability”—a broad notion of uncertainty—thereby bridging the gap between uncertainty quantification and human-interpretable diagnostic reasoning. Methodologically, we design five counterfactual generation strategies: Image Gradient Descent (IGD), Feature Gradient Descent (FGD), Autoencoder-based Reconstruction (Reco), Latent-space Gradient Descent (LGD), and Latent-space Nearest Neighbor search (LNN). These are integrated with a pre-trained multimodal large language model (MLLM) to generate precise, natural-language attributions. Evaluated on six low-capability datasets with well-defined failure causes, Reco, LGD, and LNN achieve top performance. Crucially, incorporating counterfactual images significantly improves MLLM attribution accuracy, empirically validating their role as an essential interpretability bridge between model uncertainty and human-understandable explanations.

Technology Category

Application Category

📝 Abstract
There exist many methods to explain how an image classification model generates its decision, but very little work has explored methods to explain why a classifier might lack confidence in its prediction. As there are various reasons the classifier might lose confidence, it would be valuable for this model to not only indicate its level of uncertainty but also explain why it is uncertain. Counterfactual images have been used to visualize changes that could be made to an image to generate a different classification decision. In this work, we explore the use of counterfactuals to offer an explanation for low model competency--a generalized form of predictive uncertainty that measures confidence. Toward this end, we develop five novel methods to generate high-competency counterfactual images, namely Image Gradient Descent (IGD), Feature Gradient Descent (FGD), Autoencoder Reconstruction (Reco), Latent Gradient Descent (LGD), and Latent Nearest Neighbors (LNN). We evaluate these methods across two unique datasets containing images with six known causes for low model competency and find Reco, LGD, and LNN to be the most promising methods for counterfactual generation. We further evaluate how these three methods can be utilized by pre-trained Multimodal Large Language Models (MLLMs) to generate language explanations for low model competency. We find that the inclusion of a counterfactual image in the language model query greatly increases the ability of the model to generate an accurate explanation for the cause of low model competency, thus demonstrating the utility of counterfactual images in explaining low perception model competency.
Problem

Research questions and friction points this paper is trying to address.

Explaining low confidence in image classifier predictions
Generating counterfactual images to reveal uncertainty causes
Using MLLMs to create language explanations with counterfactuals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates high-competency counterfactual images
Develops five novel counterfactual methods
Uses MLLMs for language explanations
🔎 Similar Papers
No similar papers found.
S
Sara Pohland
University of California, Berkeley, CA 94720, USA
Claire Tomlin
Claire Tomlin
UC Berkeley