A Survey on Interpretability in Visual Recognition

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

Current research on explainability in vision recognition models lacks a systematic classification framework, hindering reliable deployment in high-stakes domains such as autonomous driving and medical diagnosis. Method: We propose the first human-centered, four-dimensional taxonomy—comprising explanation intent, target object, presentation modality, and methodological foundation—derived from interdisciplinary modeling integrating human-computer interaction (HCI) principles and eXplainable AI (XAI) theory. Through comprehensive literature review, we formalize evaluation criteria for each dimension and conduct the first systematic analysis of opportunities introduced by multimodal large language models (MLLMs). Contribution/Results: Our framework enables structured organization of explainability methods, facilitates failure diagnosis of vision models, guides principled design of explanation techniques, and establishes a rigorous theoretical foundation and actionable roadmap for deploying interpretable models in safety-critical applications.

Technology Category

Application Category

📝 Abstract

In recent years, visual recognition methods have advanced significantly, finding applications across diverse fields. While researchers seek to understand the mechanisms behind the success of these models, there is also a growing impetus to deploy them in critical areas like autonomous driving and medical diagnostics to better diagnose failures, which promotes the development of interpretability research. This paper systematically reviews existing research on the interpretability of visual recognition models and proposes a taxonomy of methods from a human-centered perspective. The proposed taxonomy categorizes interpretable recognition methods based on Intent, Object, Presentation, and Methodology, thereby establishing a systematic and coherent set of grouping criteria for these XAI methods. Additionally, we summarize the requirements for evaluation metrics and explore new opportunities enabled by recent technologies, such as large multimodal models. We aim to organize existing research in this domain and inspire future investigations into the interpretability of visual recognition models.

Problem

Research questions and friction points this paper is trying to address.

Survey interpretability in visual recognition models

Propose taxonomy for human-centered interpretability methods

Explore evaluation metrics and new technology opportunities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes taxonomy for interpretable visual recognition methods

Categorizes methods by Intent, Object, Presentation, Methodology

Explores evaluation metrics and large multimodal models

🔎 Similar Papers

No similar papers found.