Beyond Class Tokens: LLM-guided Dominant Property Mining for Few-shot Classification

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Conventional few-shot learning (FSL) methods relying solely on class-name text embeddings suffer from insufficient visual representation diversity. Method: This paper proposes BCT-CLIP, which leverages large language models (LLMs) to automatically discover discriminative dominant attributes from images, enabling fine-grained and multi-granular visual representations beyond coarse category names. It introduces a Multi-Attribute Generator (MPG) and a clustering-based pruning mechanism for attribute refinement, and incorporates attribute-level contrastive learning to jointly encode global category semantics and local patch-aware features. The framework integrates LLM guidance, cross-modal cross-attention, and contrastive learning to significantly enhance inter-class discrimination. Contribution/Results: BCT-CLIP achieves state-of-the-art performance across 11 mainstream FSL benchmarks, empirically validating that dominant attribute mining is critical for improving few-shot generalization.

Technology Category

Application Category

📝 Abstract

Few-shot Learning (FSL), which endeavors to develop the generalization ability for recognizing novel classes using only a few images, faces significant challenges due to data scarcity. Recent CLIP-like methods based on contrastive language-image pertaining mitigate the issue by leveraging textual representation of the class name for unseen image discovery. Despite the achieved success, simply aligning visual representations to class name embeddings would compromise the visual diversity for novel class discrimination. To this end, we proposed a novel Few-Shot Learning (FSL) method (BCT-CLIP) that explores extbf{dominating properties} via contrastive learning beyond simply using class tokens. Through leveraging LLM-based prior knowledge, our method pushes forward FSL with comprehensive structural image representations, including both global category representation and the patch-aware property embeddings. In particular, we presented a novel multi-property generator (MPG) with patch-aware cross-attentions to generate multiple visual property tokens, a Large-Language Model (LLM)-assistant retrieval procedure with clustering-based pruning to obtain dominating property descriptions, and a new contrastive learning strategy for property-token learning. The superior performances on the 11 widely used datasets demonstrate that our investigation of dominating properties advances discriminative class-specific representation learning and few-shot classification.

Problem

Research questions and friction points this paper is trying to address.

Address data scarcity in few-shot learning via dominant properties

Enhance class discrimination beyond simple class token alignment

Leverage LLM-guided property mining for comprehensive image representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided dominant property mining

Multi-property generator with cross-attentions

Contrastive learning for property-token

🔎 Similar Papers

No similar papers found.