🤖 AI Summary
To address the challenges of large-scale variation in rice leaf disease (RLD) lesions, high false-negative rates, and imprecise localization, this paper proposes the SaRPFF module. It introduces Register Tokens—a novel token-based guidance mechanism for 2D multi-head self-attention—to effectively suppress cross-scale attention artifacts. Combined with dilated convolutional attention and learnable deconvolutional upsampling, SaRPFF enables efficient and interpretable pyramid feature fusion. Integrated into YOLOv7, SaRPFF achieves a 2.61% AP improvement over the baseline FPN on the MRLD dataset, significantly outperforming BiFPN, NAS-FPN, and PANET. Cross-domain generalization is further validated on COCO. The core contributions lie in (1) Register Token–guided attention modeling, which decouples scale-aware representation learning from attention computation, and (2) a disentangled multi-scale fusion architecture that enhances both localization accuracy and lesion-scale adaptability.
📝 Abstract
Detecting objects across varying scales is still a challenge in computer vision, particularly in agricultural applications like Rice Leaf Disease (RLD) detection, where objects exhibit significant scale variations (SV). Conventional object detection (OD) like Faster R-CNN, SSD, and YOLO methods often fail to effectively address SV, leading to reduced accuracy and missed detections. To tackle this, we propose SaRPFF (Self-Attention with Register-based Pyramid Feature Fusion), a novel module designed to enhance multi-scale object detection. SaRPFF integrates 2D-Multi-Head Self-Attention (MHSA) with Register tokens, improving feature interpretability by mitigating artifacts within MHSA. Additionally, it integrates efficient attention atrous convolutions into the pyramid feature fusion and introduce a deconvolutional layer for refined up-sampling. We evaluate SaRPFF on YOLOv7 using the MRLD and COCO datasets. Our approach demonstrates a +2.61% improvement in Average Precision (AP) on the MRLD dataset compared to the baseline FPN method in YOLOv7. Furthermore, SaRPFF outperforms other FPN variants, including BiFPN, NAS-FPN, and PANET, showcasing its versatility and potential to advance OD techniques. This study highlights SaRPFF effectiveness in addressing SV challenges and its adaptability across FPN-based OD models.