🤖 AI Summary
Existing few-shot object detection methods, built upon the Faster R-CNN framework, jointly embed objectness and category features, leading to biased decision boundaries for novel classes and limited generalization. To address this, we propose a Unified Orthogonal Feature Space Optimization framework, introducing the first amplitude–angle disentanglement mechanism: amplitude encodes objectness (foreground/background), while angle encodes category semantics—enabling unbiased knowledge transfer from base to novel classes. We further design a hybrid background optimization strategy and a spatial-attention-based disentangled correlation module to mitigate interference from unlabeled foreground instances and alleviate overfitting in the angular space. Extensive experiments on standard benchmarks—including PASCAL VOC and COCO—demonstrate significant improvements over state-of-the-art methods, validating the effectiveness of orthogonal feature disentanglement and uniform optimization in boosting few-shot detection performance.
📝 Abstract
Few-shot object detection (FSOD) aims to detect objects with limited samples for novel classes, while relying on abundant data for base classes. Existing FSOD approaches, predominantly built on the Faster R-CNN detector, entangle objectness recognition and foreground classification within shared feature spaces. This paradigm inherently establishes class-specific objectness criteria and suffers from unrepresentative novel class samples. To resolve this limitation, we propose a Uniform Orthogonal Feature Space (UOFS) optimization framework. First, UOFS decouples the feature space into two orthogonal components, where magnitude encodes objectness and angle encodes classification. This decoupling enables transferring class-agnostic objectness knowledge from base classes to novel classes. Moreover, implementing the disentanglement requires careful attention to two challenges: (1) Base set images contain unlabeled foreground instances, causing confusion between potential novel class instances and backgrounds. (2) Angular optimization depends exclusively on base class foreground instances, inducing overfitting of angular distributions to base classes. To address these challenges, we propose a Hybrid Background Optimization (HBO) strategy: (1) Constructing a pure background base set by removing unlabeled instances in original images to provide unbiased magnitude-based objectness supervision. (2) Incorporating unlabeled foreground instances in the original base set into angular optimization to enhance distribution uniformity. Additionally, we propose a Spatial-wise Attention Disentanglement and Association (SADA) module to address task conflicts between class-agnostic and class-specific tasks. Experiments demonstrate that our method significantly outperforms existing approaches based on entangled feature spaces.