MPA: Multimodal Prototype Augmentation for Few-Shot Learning

๐Ÿ“… 2026-02-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitation of existing few-shot learning approaches, which predominantly rely on visual modality alone and construct prototypes directly from raw images while neglecting richer multimodal semantic information. To overcome this, we propose a Multimodal Prototype Augmentation (MPA) framework that, for the first time, integrates three key components: Large Language Modelโ€“generated Semantic Embeddings (LMSE) for class descriptions, Hierarchical Multi-view Augmentation (HMA), and an Adaptive Uncertain Class Absorber (AUCA), to yield more robust prototype representations. Extensive experiments demonstrate that MPA significantly outperforms state-of-the-art methods across four single-domain and six cross-domain benchmarks, achieving absolute accuracy gains of 12.29% and 24.56% under the 5-way 1-shot setting, respectively. These results underscore the efficacy of multimodal fusion and explicit uncertainty modeling in few-shot classification.

Technology Category

Application Category

๐Ÿ“ Abstract
Recently, few-shot learning (FSL) has become a popular task that aims to recognize new classes from only a few labeled examples and has been widely applied in fields such as natural science, remote sensing, and medical images. However, most existing methods focus only on the visual modality and compute prototypes directly from raw support images, which lack comprehensive and rich multimodal information. To address these limitations, we propose a novel Multimodal Prototype Augmentation FSL framework called MPA, including LLM-based Multi-Variant Semantic Enhancement (LMSE), Hierarchical Multi-View Augmentation (HMA), and an Adaptive Uncertain Class Absorber (AUCA). LMSE leverages large language models to generate diverse paraphrased category descriptions, enriching the support set with additional semantic cues. HMA exploits both natural and multi-view augmentations to enhance feature diversity (e.g., changes in viewing distance, camera angles, and lighting conditions). AUCA models uncertainty by introducing uncertain classes via interpolation and Gaussian sampling, effectively absorbing uncertain samples. Extensive experiments on four single-domain and six cross-domain FSL benchmarks demonstrate that MPA achieves superior performance compared to existing state-of-the-art methods across most settings. Notably, MPA surpasses the second-best method by 12.29% and 24.56% in the single-domain and cross-domain setting, respectively, in the 5-way 1-shot setting.
Problem

Research questions and friction points this paper is trying to address.

Few-Shot Learning
Multimodal Information
Prototype Learning
Semantic Enhancement
Cross-Domain Generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Prototype Augmentation
Few-Shot Learning
Large Language Models
Multi-View Augmentation
Uncertainty Modeling
๐Ÿ”Ž Similar Papers
No similar papers found.