🤖 AI Summary
To address the opacity and debugging challenges inherent in deep neural networks—particularly vision models—this paper introduces a semantic-level runtime analysis method leveraging multimodal vision-language models (e.g., CLIP). The approach tackles the problem of black-box decision-making by enabling interpretable, concept-aware diagnostics during inference. Its core contributions are: (1) the first formulation of semantic heatmaps and differential heatmaps, facilitating precise fault localization at both encoder and classification-head levels while attributing decisions to high-level semantic concepts; and (2) a lightweight cosine similarity comparison strategy that detects misclassifications and adversarial vulnerabilities in real time—without requiring human annotations. Evaluated on the ResNet-RIVAL10 benchmark, the method achieves accurate identification of faulty modules and semantic deficiencies, significantly improving runtime defect filtering accuracy. This advances model reliability and interpretability through principled, semantics-driven diagnostics.
📝 Abstract
Debugging of Deep Neural Networks (DNNs), particularly vision models, is very challenging due to the complex and opaque decision-making processes in these networks. In this paper, we explore multi-modal Vision-Language Models (VLMs), such as CLIP, to automatically interpret the opaque representation space of vision models using natural language. This in turn, enables a semantic analysis of model behavior using human-understandable concepts, without requiring costly human annotations. Key to our approach is the notion of semantic heatmap, that succinctly captures the statistical properties of DNNs in terms of the concepts discovered with the VLM and that are computed off-line using a held-out data set. We show the utility of semantic heatmaps for fault localization -- an essential step in debugging -- in vision models. Our proposed technique helps localize the fault in the network (encoder vs head) and also highlights the responsible high-level concepts, by leveraging novel differential heatmaps, which summarize the semantic differences between the correct and incorrect behaviour of the analyzed DNN. We further propose a lightweight runtime analysis to detect and filter-out defects at runtime, thus improving the reliability of the analyzed DNNs. The runtime analysis works by measuring and comparing the similarity between the heatmap computed for a new (unseen) input and the heatmaps computed a-priori for correct vs incorrect DNN behavior. We consider two types of defects: misclassifications and vulnerabilities to adversarial attacks. We demonstrate the debugging and runtime analysis on a case study involving a complex ResNet-based classifier trained on the RIVAL10 dataset.