🤖 AI Summary
This work addresses the limited interpretability of semantic segmentation models by proposing ProtoSeg, an interpretable segmentation framework based on prototype retrieval. Methodologically, ProtoSeg models each semantic class as a set of visual prototypes and performs patch-level similarity matching against the training set to retrieve the most relevant image patches for prediction support. A diversity loss is introduced to encourage prototypes to capture semantically rich, discriminative, and intra-class-comprehensive local parts. ProtoSeg establishes, for the first time, an “prototype–part–semantics” aligned interpretable segmentation paradigm. Evaluated on Pascal VOC and Cityscapes, ProtoSeg achieves segmentation accuracy competitive with strong baselines while providing intuitive, verifiable decision rationales—significantly enhancing model transparency and human interpretability.
📝 Abstract
We introduce ProtoSeg, a novel model for interpretable semantic image segmentation, which constructs its predictions using similar patches from the training set. To achieve accuracy comparable to baseline methods, we adapt the mechanism of prototypical parts and introduce a diversity loss function that increases the variety of prototypes within each class. We show that ProtoSeg discovers semantic concepts, in contrast to standard segmentation models. Experiments conducted on Pascal VOC and Cityscapes datasets confirm the precision and transparency of the presented method.