A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

📅 2024-06-09

🏛️ IEEE Transactions on Geoscience and Remote Sensing

📈 Citations: 18

✨ Influential: 1

career value

162K/year

🤖 AI Summary

To address the low detection accuracy of tiny objects (e.g., vehicles, transmission towers) in remote sensing imagery caused by severe pixel sparsity, this paper proposes the DNTR framework. First, a Denoising Feature Pyramid Network (DN-FPN) is designed, incorporating contrastive learning to suppress noise arising from multi-scale feature fusion. Second, a Transformer-based two-stage detector (Trans R-CNN) is constructed to enhance fine-grained representation for tiny objects. Key contributions include: (1) the first integration of contrastive learning into the FPN denoising mechanism; (2) the first deep incorporation of Transformer architectures into the R-CNN framework, specifically optimized for tiny-object detection; and (3) a modular design enabling plug-and-play deployment. Extensive experiments on AI-TOD and VisDrone demonstrate significant improvements: +17.4% in AP<sub>vt</sub> (AP for very tiny objects) and +9.6% in overall AP, outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Despite notable advancements in the field of computer vision (CV), the precise detection of tiny objects continues to pose a significant challenge, largely due to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of applications ranging from urban planning to environmental monitoring. In this article, we propose a new framework, namely, DeNoising feature pyramid network (FPN) with Trans R-CNN (DNTR), to improve the performance of tiny object detection. DNTR consists of an easy plug-in design, DeNoising FPN (DN-FPN), and an effective Transformer-based detector, Trans region-based convolutional neural network (R-CNN). Specifically, feature fusion in the FPN is important for detecting multiscale objects. However, noisy features may be produced during the fusion process since there is no regularization between the features of different scales. Therefore, we introduce a DN-FPN module that utilizes contrastive learning to suppress noise in each level’s features in the top–down path of FPN. Second, based on the two-stage framework, we replace the obsolete R-CNN detector with a novel Trans R-CNN detector to focus on the representation of tiny objects with self-attention. The experimental results manifest that our DNTR outperforms the baselines by at least 17.4% in terms of $ ext {AP}_{vt}$ on the AI-TOD dataset and 9.6% in terms of average precision (AP) on the VisDrone dataset, respectively. Our code will be available at https://github.com/hoiliu-0801/DNTR.

Problem

Research questions and friction points this paper is trying to address.

Improving tiny object detection in computer vision

Reducing noise in feature fusion for multiscale objects

Enhancing object representation with Transformer-based detector

Innovation

Methods, ideas, or system contributions that make the work stand out.

DeNoising FPN suppresses noise via contrastive learning

Trans R-CNN uses self-attention for tiny objects

Plug-in design improves multiscale feature fusion

🔎 Similar Papers

SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation