Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Few-shot object detection (FSOD) faces two key challenges: (1) novel-class features are implicitly overwhelmed by base-class features, leading to ambiguous decision boundaries; and (2) sparse support samples induce overfitting during fine-tuning. To address these, we propose a semantic side-information-guided generalized feature alignment framework. First, a knowledge matrix is constructed to explicitly model semantic relationships between base and novel classes. Second, context-aware semantic supervised contrastive learning and a region-aware masking module are introduced to enhance the discriminability of novel-class features. Third, counterfactual explanations are leveraged to eliminate bias-inducing information, improving generalization robustness. The method is backbone-agnostic—compatible with ResNet and ViT—and requires no additional annotations. Extensive experiments demonstrate state-of-the-art performance across PASCAL VOC, COCO, LVIS v1, FSOD-1K, and FSVOD-500 under diverse shot numbers and class splits, with consistent and significant gains in detection accuracy.

Technology Category

Application Category

📝 Abstract
The objective of few-shot object detection (FSOD) is to detect novel objects with few training samples. The core challenge of this task is how to construct a generalized feature space for novel categories with limited data on the basis of the base category space, which could adapt the learned detection model to unknown scenarios. However, limited by insufficient samples for novel categories, two issues still exist: (1) the features of the novel category are easily implicitly represented by the features of the base category, leading to inseparable classifier boundaries, (2) novel categories with fewer data are not enough to fully represent the distribution, where the model fine-tuning is prone to overfitting. To address these issues, we introduce the side information to alleviate the negative influences derived from the feature space and sample viewpoints and formulate a novel generalized feature representation learning method for FSOD. Specifically, we first utilize embedding side information to construct a knowledge matrix to quantify the semantic relationship between the base and novel categories. Then, to strengthen the discrimination between semantically similar categories, we further develop contextual semantic supervised contrastive learning which embeds side information. Furthermore, to prevent overfitting problems caused by sparse samples, a side-information guided region-aware masked module is introduced to augment the diversity of samples, which finds and abandons biased information that discriminates between similar categories via counterfactual explanation, and refines the discriminative representation space further. Extensive experiments using ResNet and ViT backbones on PASCAL VOC, MS COCO, LVIS V1, FSOD-1K, and FSVOD-500 benchmarks demonstrate that our model outperforms the previous state-of-the-art methods, significantly improving the ability of FSOD in most shots/splits.
Problem

Research questions and friction points this paper is trying to address.

Construct generalized feature space for novel categories with limited data
Address inseparable classifier boundaries in few-shot object detection
Prevent overfitting in sparse samples via side information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding side information constructs knowledge matrix
Contextual semantic supervised contrastive learning enhances discrimination
Side-information guided region-aware masking prevents overfitting
🔎 Similar Papers
No similar papers found.
Ruoyu Chen
Ruoyu Chen
Institute of Information Engineering, Chinese Academy of Sciences.
Explainable AITrustworthy AIFoundation Model
H
Hua Zhang
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China, and also with the School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
Jingzhi Li
Jingzhi Li
University of Science and Technology Beijing
Face PrivacyTrustworthy AI
L
Li Liu
College of Electronic Science and Technology, National University of Defense Technology, Changsha 430074, China
Z
Zhen Huang
College of Computer, National University of Defense Technology, Changsha 430074, China
Xiaochun Cao
Xiaochun Cao
Sun Yat-sen University
Computer VisionArtificial IntelligenceMultimediaMachine Learning