Visual Explanation via Similar Feature Activation for Metric Learning

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address the critical lack of interpretability in metric learning models—particularly those lacking fully connected classification layers—this paper introduces SFAM, the first visual explanation method specifically designed for such models. SFAM quantifies the contribution of each channel to inter-sample similarity discrimination directly in the embedding space, using Euclidean or cosine similarity, via a novel Channel-wise Importance Score (CIS). It operates solely on intermediate CNN feature maps, requiring no label supervision or auxiliary classification layers, and generates high-fidelity feature activation maps through channel-weighted fusion. Evaluated across multiple benchmark datasets, SFAM achieves substantial improvements in localization accuracy (e.g., +12.3% Top-1 localization accuracy) and qualitative explanation quality. This work marks the first systematic extension of visual explanation techniques to the metric learning paradigm, providing a new tool for trustworthy model evaluation and algorithmic iteration.

Technology Category

Application Category

📝 Abstract

Visual explanation maps enhance the trustworthiness of decisions made by deep learning models and offer valuable guidance for developing new algorithms in image recognition tasks. Class activation maps (CAM) and their variants (e.g., Grad-CAM and Relevance-CAM) have been extensively employed to explore the interpretability of softmax-based convolutional neural networks, which require a fully connected layer as the classifier for decision-making. However, these methods cannot be directly applied to metric learning models, as such models lack a fully connected layer functioning as a classifier. To address this limitation, we propose a novel visual explanation method termed Similar Feature Activation Map (SFAM). This method introduces the channel-wise contribution importance score (CIS) to measure feature importance, derived from the similarity measurement between two image embeddings. The explanation map is constructed by linearly combining the proposed importance weights with the feature map from a CNN model. Quantitative and qualitative experiments show that SFAM provides highly promising interpretable visual explanations for CNN models using Euclidean distance or cosine similarity as the similarity metric.

Problem

Research questions and friction points this paper is trying to address.

Lack of visual explanation methods for metric learning models

Existing CAM methods require fully connected classifier layers

Need interpretable visual explanations for similarity-based CNN models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Similar Feature Activation Map (SFAM)

Uses channel-wise contribution importance score (CIS)

Combines importance weights with CNN feature maps

🔎 Similar Papers

Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach