PrAda: Few-Shot Visual Adaptation for Text-Prompted Segmentation

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the performance degradation of text-prompted segmentation models in target domains caused by domain shift—particularly category misalignment—by introducing, for the first time, a few-shot visual adaptation setting. The authors propose PrAda, a parameter-efficient prototype adaptation method that operates with the original model frozen. PrAda constructs category prototypes by fusing fine-grained pixel-level features with high-level Transformer representations and integrates textual predictions via learnable importance weights. Extensive experiments demonstrate that PrAda significantly outperforms existing approaches across five benchmarks spanning semantic, instance, and panoptic segmentation tasks, achieving effective cross-domain adaptation while preserving zero-shot capabilities.

📝 Abstract

Segmenting images is critical for visual understanding but demands extensive pixel-level annotations. Foundational models have enabled new paradigms for predicting new classes guided by textual prompts, without annotations from the target domain. Yet, on specialized target domains, far from the original pre-training, their performance degrades. We study the errors of existing methods under such domain-shift, finding that misclassification rather than mask generation is the main culprit. To address this, we introduce the novel problem of Few-Shot Visual Adaptation for text-prompted Segmentation. This kind of adaptation has been largely studied for image classification, but it remains unexplored for segmentation. We tackle this task with Prototype Adaptation (PrAda), a novel, parameter-efficient method that adapts a frozen text-prompted segmentation model. Our approach learns class-specific prototypes by combining fine-grained pixel features and high-level transformer representations, which are then fused with the original text-based predictions through a learned importance factor. This preserves the model's zero-shot potential while enabling strong adaptation to new domains. Experiments across semantic, instance, and panoptic segmentation on five benchmarks demonstrate that PrAda yields significant improvements over state-of-the-art and proposed baselines.

Problem

Research questions and friction points this paper is trying to address.

Few-Shot Visual Adaptation

Text-Prompted Segmentation

Domain Shift

Segmentation

Foundation Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-Shot Adaptation

Text-Prompted Segmentation

Prototype Adaptation