Multi-Scale Grouped Prototypes for Interpretable Semantic Segmentation

📅 2024-09-14

🏛️ IEEE Workshop/Winter Conference on Applications of Computer Vision

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Weak interpretability of semantic segmentation models and the lack of multi-scale semantic traceability in prototype learning motivate this work. We propose a multi-scale explicit prototype learning framework. Methodologically, we introduce a novel multi-scale explicit prototype layer coupled with a cross-scale sparse grouping mechanism, enabling hierarchical and sparse similarity mapping between local image regions and authentic image-patch prototypes. By integrating multi-scale feature extraction, prototype-based representation learning, and similarity-driven dense prediction modeling, our approach enhances semantic traceability between predictions and learned patterns. Experiments on Pascal VOC, Cityscapes, and ADE20K demonstrate that the method significantly improves model sparsity and interpretability while narrowing the performance gap to near black-box model levels. To our knowledge, this is the first framework to jointly achieve hierarchical activation, sparse prototype selection, and semantically grounded, interpretable prototype–input interactions.

Technology Category

Application Category

📝 Abstract

Prototypical part learning is emerging as a promising approach for making semantic segmentation interpretable. The model selects real patches seen during training as prototypes and constructs the dense prediction map based on the similarity between parts of the test image and the prototypes. This improves interpretability since the user can inspect the link between the predicted output and the patterns learned by the model in terms of prototypical information. In this paper, we propose a method for inter-pretable semantic segmentation that leverages multi-scale image representation for prototypical part learning. First, we introduce a prototype layer that explicitly learns diverse prototypical parts at several scales, leading to multi-scale representations in the prototype activation output. Then, we propose a sparse grouping mechanism that produces multi-scale sparse groups of these scale-specific prototypical parts. This provides a deeper understanding of the interactions between multi-scale object representations while enhancing the interpretability of the segmentation model. The experiments conducted on Pascal VOC, Cityscapes, and ADE20K demonstrate that the proposed method increases model sparsity, improves interpretability over existing prototype-based methods, and narrows the performance gap with the non-interpretable counterpart models. Code is available at github.com/eceo-epfl/ScaleProtoSeg.

Problem

Research questions and friction points this paper is trying to address.

Enhancing interpretability in semantic segmentation using prototypes

Learning multi-scale prototypical parts for better object representation

Improving model sparsity and performance in interpretable segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale prototype layer for diverse part learning

Sparse grouping mechanism for multi-scale interactions

Improved interpretability and performance in segmentation

🔎 Similar Papers

Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM Empowerment