YOLO-PRO: Enhancing Instance-Specific Object Detection with Full-Channel Global Self-Attention

๐Ÿ“… 2025-03-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional bottleneck structures in object detection suffer from weakened instance discriminability due to reliance on batch-level statistics, while decoupled heads introduce computational redundancy. To address these issues, this paper proposes an Instance-Specific Bottleneck (ISB) and an Instance-Specific Asymmetric Decoupled Head (ISADH). ISB introduces a novel full-channel global self-attention mechanism that jointly integrates batch-level statistics and instance-level features, enabling dual-stream heterogeneous representation learning. ISADH is the first decoupled head design supporting hierarchical multi-dimensional feature fusion, substantially reducing parameter and FLOP redundancy. Integrated into a lightweight YOLO framework, the proposed modules achieve state-of-the-art performance on MS-COCO: +1.0โ€“1.6% AP over YOLOv8 and +0.1โ€“0.5% AP over YOLO11, demonstrating simultaneous gains in accuracy and efficiency for edge deployment.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper addresses the inherent limitations of conventional bottleneck structures (diminished instance discriminability due to overemphasis on batch statistics) and decoupled heads (computational redundancy) in object detection frameworks by proposing two novel modules: the Instance-Specific Bottleneck with full-channel global self-attention (ISB) and the Instance-Specific Asymmetric Decoupled Head (ISADH). The ISB module innovatively reconstructs feature maps to establish an efficient full-channel global attention mechanism through synergistic fusion of batch-statistical and instance-specific features. Complementing this, the ISADH module pioneers an asymmetric decoupled architecture enabling hierarchical multi-dimensional feature integration via dual-stream batch-instance representation fusion. Extensive experiments on the MS-COCO benchmark demonstrate that the coordinated deployment of ISB and ISADH in the YOLO-PRO framework achieves state-of-the-art performance across all computational scales. Specifically, YOLO-PRO surpasses YOLOv8 by 1.0-1.6% AP (N/S/M/L/X scales) and outperforms YOLO11 by 0.1-0.5% AP in critical M/L/X groups, while maintaining competitive computational efficiency. This work provides practical insights for developing high-precision detectors deployable on edge devices.
Problem

Research questions and friction points this paper is trying to address.

Overcomes limitations in object detection bottleneck structures.
Reduces computational redundancy in decoupled heads.
Enhances instance-specific object detection accuracy.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Full-channel global self-attention mechanism
Asymmetric decoupled head architecture
Hierarchical multi-dimensional feature integration
๐Ÿ”Ž Similar Papers
No similar papers found.