Dictionary-based Framework for Interpretable and Consistent Object Parsing

📅 2025-02-26

📈 Citations: 1

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This paper addresses the insufficient interpretability and hierarchical consistency in part-level semantic segmentation. To tackle these issues, we propose CoCal—a novel framework built upon a dictionary-based mask Transformer that explicitly binds semantic categories into a hierarchical dictionary structure. CoCal introduces the first joint modeling mechanism integrating intra-level part contrastive learning with inter-level logical constraints, complemented by a part-level contrastive loss and pixel-wise logical post-processing to enforce semantic consistency in part-to-object membership relations. Evaluated on PartImageNet and Pascal-Part-108, CoCal achieves absolute improvements of +2.08% and +0.70% in part-level mIoU, respectively, while also attaining new state-of-the-art performance on object-level segmentation. The framework significantly enhances both the interpretability and structural coherence of segmentation outputs.

Technology Category

Application Category

📝 Abstract

In this work, we present CoCal, an interpretable and consistent object parsing framework based on dictionary-based mask transformer. Designed around Contrastive Components and Logical Constraints, CoCal rethinks existing cluster-based mask transformer architectures used in segmentation; Specifically, CoCal utilizes a set of dictionary components, with each component being explicitly linked to a specific semantic class. To advance this concept, CoCal introduces a hierarchical formulation of dictionary components that aligns with the semantic hierarchy. This is achieved through the integration of both within-level contrastive components and cross-level logical constraints. Concretely, CoCal employs a component-wise contrastive algorithm at each semantic level, enabling the contrasting of dictionary components within the same class against those from different classes. Furthermore, CoCal addresses logical concerns by ensuring that the dictionary component representing a particular part is closer to its corresponding object component than to those of other objects through a cross-level contrastive learning objective. To further enhance our logical relation modeling, we implement a post-processing function inspired by the principle that a pixel assigned to a part should also be assigned to its corresponding object. With these innovations, CoCal establishes a new state-of-the-art performance on both PartImageNet and Pascal-Part-108, outperforming previous methods by a significant margin of 2.08% and 0.70% in part mIoU, respectively. Moreover, CoCal exhibits notable enhancements in object-level metrics across these benchmarks, highlighting its capacity to not only refine parsing at a finer level but also elevate the overall quality of object segmentation.

Problem

Research questions and friction points this paper is trying to address.

Enhances object parsing interpretability

Improves semantic consistency in segmentation

Advances hierarchical dictionary component formulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dictionary-based mask transformer framework

Hierarchical formulation of dictionary components

Component-wise contrastive algorithm for semantic levels

🔎 Similar Papers

Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations