Beyond Class Tokens: LLM-guided Dominant Property Mining for Few-shot Classification

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional few-shot learning (FSL) methods relying solely on class-name text embeddings suffer from insufficient visual representation diversity. Method: This paper proposes BCT-CLIP, which leverages large language models (LLMs) to automatically discover discriminative dominant attributes from images, enabling fine-grained and multi-granular visual representations beyond coarse category names. It introduces a Multi-Attribute Generator (MPG) and a clustering-based pruning mechanism for attribute refinement, and incorporates attribute-level contrastive learning to jointly encode global category semantics and local patch-aware features. The framework integrates LLM guidance, cross-modal cross-attention, and contrastive learning to significantly enhance inter-class discrimination. Contribution/Results: BCT-CLIP achieves state-of-the-art performance across 11 mainstream FSL benchmarks, empirically validating that dominant attribute mining is critical for improving few-shot generalization.

Technology Category

Application Category

📝 Abstract
Few-shot Learning (FSL), which endeavors to develop the generalization ability for recognizing novel classes using only a few images, faces significant challenges due to data scarcity. Recent CLIP-like methods based on contrastive language-image pertaining mitigate the issue by leveraging textual representation of the class name for unseen image discovery. Despite the achieved success, simply aligning visual representations to class name embeddings would compromise the visual diversity for novel class discrimination. To this end, we proposed a novel Few-Shot Learning (FSL) method (BCT-CLIP) that explores extbf{dominating properties} via contrastive learning beyond simply using class tokens. Through leveraging LLM-based prior knowledge, our method pushes forward FSL with comprehensive structural image representations, including both global category representation and the patch-aware property embeddings. In particular, we presented a novel multi-property generator (MPG) with patch-aware cross-attentions to generate multiple visual property tokens, a Large-Language Model (LLM)-assistant retrieval procedure with clustering-based pruning to obtain dominating property descriptions, and a new contrastive learning strategy for property-token learning. The superior performances on the 11 widely used datasets demonstrate that our investigation of dominating properties advances discriminative class-specific representation learning and few-shot classification.
Problem

Research questions and friction points this paper is trying to address.

Address data scarcity in few-shot learning via dominant properties
Enhance class discrimination beyond simple class token alignment
Leverage LLM-guided property mining for comprehensive image representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided dominant property mining
Multi-property generator with cross-attentions
Contrastive learning for property-token
🔎 Similar Papers
No similar papers found.
W
Wei Zhuo
School of Artificial Intelligence and the National Engineering Laboratory of Big Data System Computing Technology, Shenzhen University, Shenzhen 518060, China
R
Runjie Luo
School of Artificial Intelligence and the National Engineering Laboratory of Big Data System Computing Technology, Shenzhen University, Shenzhen 518060, China
Wufeng Xue
Wufeng Xue
Shenzhen University; Xian Jiaotong University; University of Western Ontario
medical image analysiscomputer visionimage processingimage quality assessment
Linlin Shen
Linlin Shen
Shenzhen University
Deep LearningComputer VisionFacial Analysis/RecognitionMedical Image Analysis