Attention-disentangled Uniform Orthogonal Feature Space Optimization for Few-shot Object Detection

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing few-shot object detection methods, built upon the Faster R-CNN framework, jointly embed objectness and category features, leading to biased decision boundaries for novel classes and limited generalization. To address this, we propose a Unified Orthogonal Feature Space Optimization framework, introducing the first amplitude–angle disentanglement mechanism: amplitude encodes objectness (foreground/background), while angle encodes category semantics—enabling unbiased knowledge transfer from base to novel classes. We further design a hybrid background optimization strategy and a spatial-attention-based disentangled correlation module to mitigate interference from unlabeled foreground instances and alleviate overfitting in the angular space. Extensive experiments on standard benchmarks—including PASCAL VOC and COCO—demonstrate significant improvements over state-of-the-art methods, validating the effectiveness of orthogonal feature disentanglement and uniform optimization in boosting few-shot detection performance.

Technology Category

Application Category

📝 Abstract
Few-shot object detection (FSOD) aims to detect objects with limited samples for novel classes, while relying on abundant data for base classes. Existing FSOD approaches, predominantly built on the Faster R-CNN detector, entangle objectness recognition and foreground classification within shared feature spaces. This paradigm inherently establishes class-specific objectness criteria and suffers from unrepresentative novel class samples. To resolve this limitation, we propose a Uniform Orthogonal Feature Space (UOFS) optimization framework. First, UOFS decouples the feature space into two orthogonal components, where magnitude encodes objectness and angle encodes classification. This decoupling enables transferring class-agnostic objectness knowledge from base classes to novel classes. Moreover, implementing the disentanglement requires careful attention to two challenges: (1) Base set images contain unlabeled foreground instances, causing confusion between potential novel class instances and backgrounds. (2) Angular optimization depends exclusively on base class foreground instances, inducing overfitting of angular distributions to base classes. To address these challenges, we propose a Hybrid Background Optimization (HBO) strategy: (1) Constructing a pure background base set by removing unlabeled instances in original images to provide unbiased magnitude-based objectness supervision. (2) Incorporating unlabeled foreground instances in the original base set into angular optimization to enhance distribution uniformity. Additionally, we propose a Spatial-wise Attention Disentanglement and Association (SADA) module to address task conflicts between class-agnostic and class-specific tasks. Experiments demonstrate that our method significantly outperforms existing approaches based on entangled feature spaces.
Problem

Research questions and friction points this paper is trying to address.

Decouples feature space for objectness and classification transfer
Addresses confusion from unlabeled novel class instances in base set
Prevents overfitting of angular distributions to base classes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples feature space into orthogonal components
Uses Hybrid Background Optimization strategy
Implements Spatial-wise Attention Disentanglement module
🔎 Similar Papers
No similar papers found.
T
Taijin Zhao
University of Electronic Science and Technology of China, Chengdu, 611731, China
Heqian Qiu
Heqian Qiu
University of Electronic Science and Technology of China, UESTC
Object DetectionMultimodal
Y
Yu Dai
University of Electronic Science and Technology of China, Chengdu, 611731, China
L
Lanxiao Wang
University of Electronic Science and Technology of China, Chengdu, 611731, China
F
Fanman Meng
University of Electronic Science and Technology of China, Chengdu, 611731, China
Qingbo Wu
Qingbo Wu
University of Electronic Science and Technology of China
video codingimage and video quality assessment
H
Hongliang Li
University of Electronic Science and Technology of China, Chengdu, 611731, China