A Novel Framework for Automated Explain Vision Model Using Vision-Language Models

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision models prioritize accuracy-oriented metrics (e.g., IoU, mAP) while neglecting interpretability; mainstream XAI methods provide only instance-level attribution explanations and fail to characterize global behavioral patterns across entire datasets, hindering bias detection and failure analysis. Method: We propose the first automated dual-granularity explanation framework grounded in vision-language models (VLMs), unifying instance-level attribution mapping with dataset-level semantic abstraction. Leveraging prompt learning and cross-modal alignment, it generates human-readable, semantically rich behavioral explanations without model fine-tuning. Contribution/Results: Experiments demonstrate that our framework efficiently identifies failure cases, uncovers decision preferences and latent biases, and enhances model transparency and trustworthiness—all at low computational cost. It bridges the gap between local explainability and global behavioral understanding in vision systems.

Technology Category

Application Category

📝 Abstract
The development of many vision models mainly focuses on improving their performance using metrics such as accuracy, IoU, and mAP, with less attention to explainability due to the complexity of applying xAI methods to provide a meaningful explanation of trained models. Although many existing xAI methods aim to explain vision models sample-by-sample, methods explaining the general behavior of vision models, which can only be captured after running on a large dataset, are still underexplored. Furthermore, understanding the behavior of vision models on general images can be very important to prevent biased judgments and help identify the model's trends and patterns. With the application of Vision-Language Models, this paper proposes a pipeline to explain vision models at both the sample and dataset levels. The proposed pipeline can be used to discover failure cases and gain insights into vision models with minimal effort, thereby integrating vision model development with xAI analysis to advance image analysis.
Problem

Research questions and friction points this paper is trying to address.

Explaining vision models' general behavior on large datasets
Reducing complexity of applying xAI methods for meaningful explanations
Preventing biased judgments by understanding model trends and patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language models for explainable AI
Pipeline for sample and dataset level analysis
Automated failure case discovery and insights
🔎 Similar Papers
No similar papers found.
P
Phu-Vinh Nguyen
Uppsala University, Sweden
Tan-Hanh Pham
Tan-Hanh Pham
MGH - Harvard Medical School
RoboticsAI
Chris Ngo
Chris Ngo
Knovel Engineering
T
Truong Son Hy
Department of Computer Science, University of Alabama at Birmingham, USA