Logits DeConfusion with CLIP for Few-Shot Learning

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

CLIP suffers from significant performance degradation in few-shot learning due to inter-class confusion in the logits layer. To address this, we propose the Logits Deconfusion Framework (LDF), which introduces, for the first time, a learnable Inter-Class Deconfusion (ICD) module operating directly in the logits space. LDF jointly employs Multi-level Adapter Fusion (MAF) to achieve fine-grained vision-language feature alignment and discriminative enhancement. MAF adaptively fuses multi-level CLIP visual features via residual connections, while ICD explicitly models inter-class similarity and suppresses confusing logits. Critically, LDF requires no fine-tuning of the image encoder. Evaluated on standard few-shot benchmarks—including MiniImageNet and CUB—LDF consistently improves classification accuracy by 3.2–5.7% on average, substantially mitigating inter-class confusion. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

With its powerful visual-language alignment capability, CLIP performs well in zero-shot and few-shot learning tasks. However, we found in experiments that CLIP's logits suffer from serious inter-class confusion problems in downstream tasks, and the ambiguity between categories seriously affects the accuracy. To address this challenge, we propose a novel method called Logits DeConfusion, which effectively learns and eliminates inter-class confusion in logits by combining our Multi-level Adapter Fusion (MAF) module with our Inter-Class Deconfusion (ICD) module. Our MAF extracts features from different levels and fuses them uniformly to enhance feature representation. Our ICD learnably eliminates inter-class confusion in logits with a residual structure. Experimental results show that our method can significantly improve the classification performance and alleviate the inter-class confusion problem. The code is available at https://github.com/LiShuo1001/LDC.

Problem

Research questions and friction points this paper is trying to address.

CLIP's logits suffer inter-class confusion in downstream tasks

Ambiguity between categories reduces few-shot learning accuracy

Propose Logits DeConfusion to eliminate inter-class logit confusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Logits DeConfusion method reduces inter-class confusion

Multi-level Adapter Fusion enhances feature representation

Inter-Class Deconfusion module with residual structure

🔎 Similar Papers

No similar papers found.