SynSeg: Feature Synergy for Multi-Category Contrastive Learning in Open-Vocabulary Semantic Segmentation

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the challenges of fine-grained category discrimination and severe semantic misalignment under weak supervision in open-vocabulary semantic segmentation, this paper proposes SynSeg. The method introduces three key innovations: (1) a multi-class contrastive learning mechanism that jointly enforces intra-class alignment and inter-class separation to enhance feature discriminability; (2) a feature-cooperative architecture that mitigates foreground bias in the visual encoder; and (3) a synergistic integration of prior knowledge, semantic activation map enhancement, and feature reconstruction to improve semantic localization accuracy under weak supervision. Evaluated on four standard benchmarks—PASCAL VOC, PASCAL-Context, COCO-Stuff, and Cityscapes—SynSeg achieves new state-of-the-art mIoU scores, surpassing prior methods by 4.5%, 8.9%, 2.6%, and 2.0%, respectively. These results significantly advance the performance frontier of open-vocabulary weakly supervised semantic segmentation.

Technology Category

Application Category

📝 Abstract

Semantic segmentation in open-vocabulary scenarios presents significant challenges due to the wide range and granularity of semantic categories. Existing weakly-supervised methods often rely on category-specific supervision and ill-suited feature construction methods for contrastive learning, leading to semantic misalignment and poor performance. In this work, we propose a novel weakly-supervised approach, SynSeg, to address the challenges. SynSeg performs Multi-Category Contrastive Learning (MCCL) as a stronger training signal with a new feature reconstruction framework named Feature Synergy Structure (FSS). Specifically, MCCL strategy robustly combines both intra- and inter-category alignment and separation in order to make the model learn the knowledge of correlations from different categories within the same image. Moreover, FSS reconstructs discriminative features for contrastive learning through prior fusion and semantic-activation-map enhancement, effectively avoiding the foreground bias introduced by the visual encoder. In general, SynSeg effectively improves the abilities in semantic localization and discrimination under weak supervision. Extensive experiments on benchmarks demonstrate that our method outperforms state-of-the-art (SOTA) performance. For instance, SynSeg achieves higher accuracy than SOTA baselines by 4.5% on VOC, 8.9% on Context, 2.6% on Object and 2.0% on City.

Problem

Research questions and friction points this paper is trying to address.

Addressing semantic misalignment in open-vocabulary segmentation

Improving weakly-supervised multi-category contrastive learning

Eliminating foreground bias in semantic feature reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Category Contrastive Learning for intra-inter alignment

Feature Synergy Structure reconstructs discriminative features

Prior fusion and semantic-activation-map enhancement

🔎 Similar Papers

Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification