PMCE: Probabilistic Multi-Granularity Semantics with Caption-Guided Enhancement for Few-Shot Learning

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of few-shot learning, where scarce support samples often lead to biased prototype estimation and poor generalization. To mitigate this, the authors propose the PMCE framework, which introduces caption-guided instance-level semantic augmentation on the query side for the first time and retrieves statistical information from base classes in a non-parametric knowledge bank using class name embeddings, thereby enabling bidirectional refinement of both support and query representations. The framework effectively integrates multi-granularity semantics through lightweight modules—including frozen BLIP-generated image captions, CLIP-based class embeddings, MAP-based prototype updating, and consistency regularization—without extensive retraining. Extensive experiments demonstrate that PMCE significantly outperforms strong baselines across four benchmarks, achieving a 7.71% improvement over the best existing semantic method under the 1-shot setting on MiniImageNet.

Technology Category

Application Category

📝 Abstract
Few-shot learning aims to identify novel categories from only a handful of labeled samples, where prototypes estimated from scarce data are often biased and generalize poorly. Semantic-based methods alleviate this by introducing coarse class-level information, but they are mostly applied on the support side, leaving query representations unchanged. In this paper, we present PMCE, a Probabilistic few-shot framework that leverages Multi-granularity semantics with Caption-guided Enhancement. PMCE constructs a nonparametric knowledge bank that stores visual statistics for each category as well as CLIP-encoded class name embeddings of the base classes. At meta-test time, the most relevant base classes are retrieved based on the similarities of class name embeddings for each novel category. These statistics are then aggregated into category-specific prior information and fused with the support set prototypes via a simple MAP update. Simultaneously, a frozen BLIP captioner provides label-free instance-level image descriptions, and a lightweight enhancer trained on base classes optimizes both support prototypes and query features under an inductive protocol with a consistency regularization to stabilize noisy captions. Experiments on four benchmarks show that PMCE consistently improves over strong baselines, achieving up to 7.71% absolute gain over the strongest semantic competitor on MiniImageNet in the 1-shot setting. Our code is available at https://anonymous.4open.science/r/PMCE-275D
Problem

Research questions and friction points this paper is trying to address.

few-shot learning
prototype bias
semantic information
query representation
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-shot learning
Probabilistic modeling
Multi-granularity semantics
Caption-guided enhancement
Nonparametric knowledge bank