SaRPFF: A Self-Attention with Register-based Pyramid Feature Fusion module for enhanced RLD detection

📅 2024-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of large-scale variation in rice leaf disease (RLD) lesions, high false-negative rates, and imprecise localization, this paper proposes the SaRPFF module. It introduces Register Tokens—a novel token-based guidance mechanism for 2D multi-head self-attention—to effectively suppress cross-scale attention artifacts. Combined with dilated convolutional attention and learnable deconvolutional upsampling, SaRPFF enables efficient and interpretable pyramid feature fusion. Integrated into YOLOv7, SaRPFF achieves a 2.61% AP improvement over the baseline FPN on the MRLD dataset, significantly outperforming BiFPN, NAS-FPN, and PANET. Cross-domain generalization is further validated on COCO. The core contributions lie in (1) Register Token–guided attention modeling, which decouples scale-aware representation learning from attention computation, and (2) a disentangled multi-scale fusion architecture that enhances both localization accuracy and lesion-scale adaptability.

Technology Category

Application Category

📝 Abstract
Detecting objects across varying scales is still a challenge in computer vision, particularly in agricultural applications like Rice Leaf Disease (RLD) detection, where objects exhibit significant scale variations (SV). Conventional object detection (OD) like Faster R-CNN, SSD, and YOLO methods often fail to effectively address SV, leading to reduced accuracy and missed detections. To tackle this, we propose SaRPFF (Self-Attention with Register-based Pyramid Feature Fusion), a novel module designed to enhance multi-scale object detection. SaRPFF integrates 2D-Multi-Head Self-Attention (MHSA) with Register tokens, improving feature interpretability by mitigating artifacts within MHSA. Additionally, it integrates efficient attention atrous convolutions into the pyramid feature fusion and introduce a deconvolutional layer for refined up-sampling. We evaluate SaRPFF on YOLOv7 using the MRLD and COCO datasets. Our approach demonstrates a +2.61% improvement in Average Precision (AP) on the MRLD dataset compared to the baseline FPN method in YOLOv7. Furthermore, SaRPFF outperforms other FPN variants, including BiFPN, NAS-FPN, and PANET, showcasing its versatility and potential to advance OD techniques. This study highlights SaRPFF effectiveness in addressing SV challenges and its adaptability across FPN-based OD models.
Problem

Research questions and friction points this paper is trying to address.

Computer Vision
Object Detection
Scale Variation
Innovation

Methods, ideas, or system contributions that make the work stand out.

SaRPFF
Multi-scale Object Detection
Attention Mechanism
Yunusa Haruna
Yunusa Haruna
Beihang University
Deep LearningComputer VisionFoundation Model
S
Shiyin Qin
Beihang University
Abdulrahman Hamman Adama Chukkol
Abdulrahman Hamman Adama Chukkol
Beijing Institute of Technology
I
Isah Bello
Tianjin University
A
A. Lawan
Beihang University