🤖 AI Summary
To address the urgent need for efficient monitoring of declining pollinator populations—particularly honeybees and bumblebees—in agricultural fields, this study introduces the first high-resolution, finely annotated image dataset specifically designed for real-world farmland scenarios. Methodologically, we propose a semi-automated annotation pipeline integrating YOLOv12-based pre-labeling with rigorous human verification, coupled with a 256×256 tiling strategy to enhance robustness in detecting small-scale pollinators. We establish a detection baseline using the RF-DETR Transformer architecture, achieving F1 scores of 0.94 and 0.92 for honeybees and bumblebees, respectively, and an mAP@0.50 of 0.559—substantially outperforming existing approaches. This dataset and model collectively provide a high-quality benchmark and practical technical foundation for intelligent pollinator monitoring in agriculture, thereby supporting ecosystem stability and food security.
📝 Abstract
Pollinator insects such as honeybees and bumblebees are vital to global food production and ecosystem stability, yet their populations are declining due to increasing anthropogenic and environmental stressors. To support scalable, automated pollinator monitoring, we introduce BuzzSet, a new large-scale dataset of high-resolution pollinator images collected in real agricultural field conditions. BuzzSet contains 7856 manually verified and labeled images, with over 8000 annotated instances across three classes: honeybees, bumblebees, and unidentified insects. Initial annotations were generated using a YOLOv12 model trained on external data and refined via human verification using open-source labeling tools. All images were preprocessed into 256~$ imes$~256 tiles to improve the detection of small insects. We provide strong baselines using the RF-DETR transformer-based object detector. The model achieves high F1-scores of 0.94 and 0.92 for honeybee and bumblebee classes, respectively, with confusion matrix results showing minimal misclassification between these categories. The unidentified class remains more challenging due to label ambiguity and lower sample frequency, yet still contributes useful insights for robustness evaluation. Overall detection quality is strong, with a best mAP@0.50 of 0.559. BuzzSet offers a valuable benchmark for small object detection, class separation under label noise, and ecological computer vision.