Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-shot Semantic Segmentation

📅 2025-06-28

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Generalized few-shot semantic segmentation (GFSS) confronts the dual challenges of scarce annotations for novel classes and maintaining performance on base classes. Existing prototype-based methods predominantly employ deterministic modeling, limiting generalization. This paper proposes FewCLIP, the first GFSS framework to incorporate CLIP’s multimodal priors: it dynamically adapts fixed text prototypes via learnable visual calibration prototypes and introduces a distributional regularization mechanism for uncertainty-aware probabilistic prototype learning. The method jointly optimizes frozen text features and randomized visual prototypes. Evaluated on PASCAL-5ⁱ and COCO-20ⁱ, FewCLIP significantly surpasses state-of-the-art methods under both generalized few-shot and class-incremental settings, achieving superior performance while ensuring strong generalization and stable base-class accuracy.

Technology Category

Application Category

📝 Abstract

Generalized Few-Shot Semantic Segmentation (GFSS) aims to extend a segmentation model to novel classes with only a few annotated examples while maintaining performance on base classes. Recently, pretrained vision-language models (VLMs) such as CLIP have been leveraged in GFSS to improve generalization on novel classes through multi-modal prototypes learning. However, existing prototype-based methods are inherently deterministic, limiting the adaptability of learned prototypes to diverse samples, particularly for novel classes with scarce annotations. To address this, we propose FewCLIP, a probabilistic prototype calibration framework over multi-modal prototypes from the pretrained CLIP, thus providing more adaptive prototype learning for GFSS. Specifically, FewCLIP first introduces a prototype calibration mechanism, which refines frozen textual prototypes with learnable visual calibration prototypes, leading to a more discriminative and adaptive representation. Furthermore, unlike deterministic prototype learning techniques, FewCLIP introduces distribution regularization over these calibration prototypes. This probabilistic formulation ensures structured and uncertainty-aware prototype learning, effectively mitigating overfitting to limited novel class data while enhancing generalization. Extensive experimental results on PASCAL-5$^i$ and COCO-20$^i$ datasets demonstrate that our proposed FewCLIP significantly outperforms state-of-the-art approaches across both GFSS and class-incremental setting. The code is available at https://github.com/jliu4ai/FewCLIP.

Problem

Research questions and friction points this paper is trying to address.

Enhance few-shot segmentation with probabilistic prototype calibration

Improve adaptability of prototypes for novel classes with scarce data

Mitigate overfitting in generalized few-shot semantic segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic prototype calibration for adaptive learning

Visual calibration prototypes refine textual prototypes

Distribution regularization ensures uncertainty-aware learning

🔎 Similar Papers

No similar papers found.