A Survey on Interpretability in Visual Recognition

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current research on explainability in vision recognition models lacks a systematic classification framework, hindering reliable deployment in high-stakes domains such as autonomous driving and medical diagnosis. Method: We propose the first human-centered, four-dimensional taxonomy—comprising explanation intent, target object, presentation modality, and methodological foundation—derived from interdisciplinary modeling integrating human-computer interaction (HCI) principles and eXplainable AI (XAI) theory. Through comprehensive literature review, we formalize evaluation criteria for each dimension and conduct the first systematic analysis of opportunities introduced by multimodal large language models (MLLMs). Contribution/Results: Our framework enables structured organization of explainability methods, facilitates failure diagnosis of vision models, guides principled design of explanation techniques, and establishes a rigorous theoretical foundation and actionable roadmap for deploying interpretable models in safety-critical applications.

Technology Category

Application Category

📝 Abstract
In recent years, visual recognition methods have advanced significantly, finding applications across diverse fields. While researchers seek to understand the mechanisms behind the success of these models, there is also a growing impetus to deploy them in critical areas like autonomous driving and medical diagnostics to better diagnose failures, which promotes the development of interpretability research. This paper systematically reviews existing research on the interpretability of visual recognition models and proposes a taxonomy of methods from a human-centered perspective. The proposed taxonomy categorizes interpretable recognition methods based on Intent, Object, Presentation, and Methodology, thereby establishing a systematic and coherent set of grouping criteria for these XAI methods. Additionally, we summarize the requirements for evaluation metrics and explore new opportunities enabled by recent technologies, such as large multimodal models. We aim to organize existing research in this domain and inspire future investigations into the interpretability of visual recognition models.
Problem

Research questions and friction points this paper is trying to address.

Survey interpretability in visual recognition models
Propose taxonomy for human-centered interpretability methods
Explore evaluation metrics and new technology opportunities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes taxonomy for interpretable visual recognition methods
Categorizes methods by Intent, Object, Presentation, Methodology
Explores evaluation metrics and large multimodal models
🔎 Similar Papers
No similar papers found.
Q
Qiyang Wan
Key Laboratory of AI Safety of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
C
Chengzhi Gao
Key Laboratory of AI Safety of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
Ruiping Wang
Ruiping Wang
Professor, Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionPattern RecognitionMachine Learning
X
Xilin Chen
Key Laboratory of AI Safety of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China