🤖 AI Summary
Small-object detection of apples in orchard scenes is severely hindered by scarce annotated data, intrinsically small object size, and frequent occlusion. To address these challenges, this paper proposes a semi-supervised detection framework integrating context-aware attention and selective patching. We introduce MAD—the first large-scale, high-resolution apple dataset—comprising 105 precisely annotated images and 4,440 unlabeled ones. A lightweight context attention module and an adaptive patching strategy are designed to enhance feature representation and localization accuracy for small objects. Extensive experiments on MAD and the MSU dataset demonstrate that our method significantly outperforms strong fully supervised baselines, achieving up to a 14.9% absolute AP gain. Quantitative ablation studies further elucidate the impact mechanisms of object scale and occlusion level on detection performance, providing actionable insights for agricultural vision systems.
📝 Abstract
Crop detection is integral for precision agriculture applications such as automated yield estimation or fruit picking. However, crop detection, e.g., apple detection in orchard environments remains challenging due to a lack of large-scale datasets and the small relative size of the crops in the image. In this work, we address these challenges by reformulating the apple detection task in a semi-supervised manner. To this end, we provide the large, high-resolution dataset MAD 1 comprising 105 labeled images with 14,667 annotated apple instances and 4,440 unlabeled images. Utilizing this dataset, we also propose a novel Semi-Supervised Small Apple Detection system S3AD based on contextual attention and selective tiling to improve the challenging detection of small apples, while limiting the computational overhead. We conduct an extensive evaluation on MAD and the MSU dataset, showing that S3AD substantially outperforms strong fully-supervised baselines, including several small object detection systems, by up to 14.9%. Additionally, we exploit the detailed annotations of our dataset w.r.t. apple properties to analyze the influence of relative size or level of occlusion on the results of various systems, quantifying current challenges.