Logits DeConfusion with CLIP for Few-Shot Learning

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
CLIP suffers from significant performance degradation in few-shot learning due to inter-class confusion in the logits layer. To address this, we propose the Logits Deconfusion Framework (LDF), which introduces, for the first time, a learnable Inter-Class Deconfusion (ICD) module operating directly in the logits space. LDF jointly employs Multi-level Adapter Fusion (MAF) to achieve fine-grained vision-language feature alignment and discriminative enhancement. MAF adaptively fuses multi-level CLIP visual features via residual connections, while ICD explicitly models inter-class similarity and suppresses confusing logits. Critically, LDF requires no fine-tuning of the image encoder. Evaluated on standard few-shot benchmarks—including MiniImageNet and CUB—LDF consistently improves classification accuracy by 3.2–5.7% on average, substantially mitigating inter-class confusion. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
With its powerful visual-language alignment capability, CLIP performs well in zero-shot and few-shot learning tasks. However, we found in experiments that CLIP's logits suffer from serious inter-class confusion problems in downstream tasks, and the ambiguity between categories seriously affects the accuracy. To address this challenge, we propose a novel method called Logits DeConfusion, which effectively learns and eliminates inter-class confusion in logits by combining our Multi-level Adapter Fusion (MAF) module with our Inter-Class Deconfusion (ICD) module. Our MAF extracts features from different levels and fuses them uniformly to enhance feature representation. Our ICD learnably eliminates inter-class confusion in logits with a residual structure. Experimental results show that our method can significantly improve the classification performance and alleviate the inter-class confusion problem. The code is available at https://github.com/LiShuo1001/LDC.
Problem

Research questions and friction points this paper is trying to address.

CLIP's logits suffer inter-class confusion in downstream tasks
Ambiguity between categories reduces few-shot learning accuracy
Propose Logits DeConfusion to eliminate inter-class logit confusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Logits DeConfusion method reduces inter-class confusion
Multi-level Adapter Fusion enhances feature representation
Inter-Class Deconfusion module with residual structure
🔎 Similar Papers
No similar papers found.
S
Shuo Li
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
F
Fang Liu
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Z
Zehua Hao
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
X
Xinyi Wang
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Lingling Li
Lingling Li
Associate Director of Biostatistics, Sanofi Genzyme
Causal inferencemissing datapropensity scoresequential analytic methodsdrug and vaccine safety
X
Xu Liu
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
P
Puhua Chen
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Wenping Ma
Wenping Ma
Xidian University
Artificial intelligence