INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation

📅 2025-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two key challenges in general-purpose promptable segmentation: inaccurate instance-level prompt generation and poor cross-domain robustness—particularly on camouflaged objects and medical images. To this end, we propose an adaptive negative-sample mining and two-stage collaborative optimization framework. Our method introduces the first instance-level negative mining mechanism, which dynamically suppresses misleading priors while enhancing high-contrast, trustworthy ones. It further integrates vision-language model-based prompt engineering, progressive prompt filtering, and semantic consistency constraints to jointly calibrate prompt generation and mask prediction. Evaluated on six heterogeneous benchmarks—including camouflaged object and multi-modal medical imaging datasets—our approach achieves significant improvements over state-of-the-art methods. It demonstrates superior generalization, cross-domain robustness, and compatibility with diverse segmentation backbones.

Technology Category

Application Category

📝 Abstract
Task-generic promptable image segmentation aims to achieve segmentation of diverse samples under a single task description by utilizing only one task-generic prompt. Current methods leverage the generalization capabilities of Vision-Language Models (VLMs) to infer instance-specific prompts from these task-generic prompts in order to guide the segmentation process. However, when VLMs struggle to generalise to some image instances, predicting instance-specific prompts becomes poor. To solve this problem, we introduce extbf{I}nstance-specific extbf{N}egative Mining for extbf{T}ask-Generic Promptable Segmentation ( extbf{INT}). The key idea of INT is to adaptively reduce the influence of irrelevant (negative) prior knowledge whilst to increase the use the most plausible prior knowledge, selected by negative mining with higher contrast, in order to optimise instance-specific prompts generation. Specifically, INT consists of two components: (1) instance-specific prompt generation, which progressively fliters out incorrect information in prompt generation; (2) semantic mask generation, which ensures each image instance segmentation matches correctly the semantics of the instance-specific prompts. INT is validated on six datasets, including camouflaged objects and medical images, demonstrating its effectiveness, robustness and scalability.
Problem

Research questions and friction points this paper is trying to address.

Inaccurate Prompt Generation
Image Segmentation
Performance Limitation
Innovation

Methods, ideas, or system contributions that make the work stand out.

INT method
Semantic Mask Generation
Relevant Information Optimization