A3-FPN: Asymptotic Content-Aware Pyramid Attention Network for Dense Visual Prediction

πŸ“… 2026-04-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

191K/year
πŸ€– AI Summary
Existing feature pyramid networks struggle to effectively model multi-scale discriminative features for dense visual prediction, particularly underperforming on small objects. This work proposes A3-FPN, a novel architecture that enables progressive decoupling for global feature interaction and incorporates a content-aware attention mechanism to enhance feature representation. During fusion and recombination stages, the method employs context-aware resampling and an information-driven redundancy optimization strategy, respectively, achieving efficient feature reassembly through positional offsets and content-adaptive weights. A3-FPN is compatible with both CNN and Transformer backbones and demonstrates significant performance gains across multiple benchmarks: it achieves 49.6 mask AP on MS COCO and 85.6 mIoU on Cityscapes when paired with OneFormer and Swin-L backbones, and also shows strong results on VisDrone2019-DET.

Technology Category

Application Category

πŸ“ Abstract
Learning multi-scale representations is the common strategy to tackle object scale variation in dense prediction tasks. Although existing feature pyramid networks have greatly advanced visual recognition, inherent design defects inhibit them from capturing discriminative features and recognizing small objects. In this work, we propose Asymptotic Content-Aware Pyramid Attention Network (A3-FPN), to augment multi-scale feature representation via the asymptotically disentangled framework and content-aware attention modules. Specifically, A3-FPN employs a horizontally-spread column network that enables asymptotically global feature interaction and disentangles each level from all hierarchical representations. In feature fusion, it collects supplementary content from the adjacent level to generate position-wise offsets and weights for context-aware resampling, and learns deep context reweights to improve intra-category similarity. In feature reassembly, it further strengthens intra-scale discriminative feature learning and reassembles redundant features based on information content and spatial variation of feature maps. Extensive experiments on MS COCO, VisDrone2019-DET and Cityscapes demonstrate that A3-FPN can be easily integrated into state-of-the-art CNN and Transformer-based architectures, yielding remarkable performance gains. Notably, when paired with OneFormer and Swin-L backbone, A3-FPN achieves 49.6 mask AP on MS COCO and 85.6 mIoU on Cityscapes. Codes are available at https://github.com/mason-ching/A3-FPN.
Problem

Research questions and friction points this paper is trying to address.

dense visual prediction
feature pyramid network
small object detection
multi-scale representation
discriminative features
Innovation

Methods, ideas, or system contributions that make the work stand out.

asymptotic disentanglement
content-aware attention
feature pyramid network
multi-scale representation
dense visual prediction
M
Meng'en Qin
Henan Engineering Research Center for Artificial Intelligence Theory and Algorithms, Henan University, Kaifeng, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China; Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong, China
Y
Yu Song
Henan Engineering Research Center for Artificial Intelligence Theory and Algorithms, Henan University, Kaifeng, China
Q
Quanling Zhao
Henan Engineering Research Center for Artificial Intelligence Theory and Algorithms, Henan University, Kaifeng, China
X
Xiaodong Yang
Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China
Y
Yingtao Che
Henan Engineering Research Center for Artificial Intelligence Theory and Algorithms, Henan University, Kaifeng, China
Xiaohui Yang
Xiaohui Yang
Henan University, Associate Professor
pattern recognitionintelligence information processing