I$^2$R: Inter and Intra-image Refinement in Few Shot Segmentation

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the semantic gap between support and query sets, as well as erroneous matching caused by visually similar but semantically conflicting regions within images in few-shot semantic segmentation, this paper proposes a novel framework that jointly optimizes cross-image and intra-image feature consistency. Our key contributions are: (1) introducing class-specific high-level semantic representations to enhance cross-image region localization accuracy; (2) designing a directional masking strategy to explicitly suppress spurious feature responses with high similarity but inconsistent labels; and (3) establishing a global semantic aggregation module coupled with bidirectional support-query image interaction. Evaluated on PASCAL-5$^i$ and COCO-20$^i$ under the 1-shot setting, our method achieves absolute mIoU improvements of 1.9% and 2.1%, respectively, significantly surpassing current state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
The annotation bottleneck in semantic segmentation has driven significant interest in few-shot segmentation, which aims to develop segmentation models capable of generalizing rapidly to novel classes using minimal exemplars. Conventional training paradigms typically generate query prior maps by extracting masked-area features from support images, followed by making predictions guided by these prior maps. However, current approaches remain constrained by two critical limitations stemming from inter- and intra-image discrepancies, both of which significantly degrade segmentation performance: 1) The semantic gap between support and query images results in mismatched features and inaccurate prior maps; 2) Visually similar yet semantically distinct regions within support or query images lead to false negative or false positive predictions. We propose a novel FSS method called extbf{I$^2$R}: 1) Using category-specific high level representations which aggregate global semantic cues from support and query images, enabling more precise inter-image region localization and address the first limitation. 2) Directional masking strategy that suppresses inconsistent support-query pixel pairs, which exhibit high feature similarity but conflicting mask, to mitigate the second issue. Experiments demonstrate that our method outperforms state-of-the-art approaches, achieving improvements of 1.9% and 2.1% in mIoU under the 1-shot setting on PASCAL-5$^i$ and COCO-20$^i$ benchmarks, respectively.
Problem

Research questions and friction points this paper is trying to address.

Addresses semantic gap between support and query images
Reduces false predictions in visually similar regions
Improves few-shot segmentation with inter intra-image refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregates global semantic cues for precise localization
Uses directional masking to suppress inconsistent pixel pairs
Addresses inter and intra-image discrepancies effectively
🔎 Similar Papers
No similar papers found.