🤖 AI Summary
Existing apple orchard monitoring datasets suffer from insufficient scene diversity, labor-intensive annotation, inadequate coverage of phenological growth stages, and lack of stereo imagery—limiting progress in fruit localization, yield estimation, and 3D reconstruction. To address these gaps, we introduce the first large-scale, binocular stereo image dataset spanning the complete apple growth cycle, systematically aligned with the BBCH phenological scale and enriched with dense pixel-level annotations and agronomically validated labels. We establish a standardized benchmark integrating agricultural science and computer vision, bridging critical gaps in growth modeling and 3D perception. Evaluation on this dataset demonstrates substantial improvements: YOLOv8 and Faster R-CNN achieve F1-score gains of 7.69% and 31.06%, respectively, in fruit detection; six-stage phenological classification accuracy exceeds 95%; and high-precision fruit localization and orchard-scale 3D reconstruction are enabled.
📝 Abstract
Deep learning has transformed computer vision for precision agriculture, yet apple orchard monitoring remains limited by dataset constraints. The lack of diverse, realistic datasets and the difficulty of annotating dense, heterogeneous scenes. Existing datasets overlook different growth stages and stereo imagery, both essential for realistic 3D modeling of orchards and tasks like fruit localization, yield estimation, and structural analysis. To address these gaps, we present AppleGrowthVision, a large-scale dataset comprising two subsets. The first includes 9,317 high resolution stereo images collected from a farm in Brandenburg (Germany), covering six agriculturally validated growth stages over a full growth cycle. The second subset consists of 1,125 densely annotated images from the same farm in Brandenburg and one in Pillnitz (Germany), containing a total of 31,084 apple labels. AppleGrowthVision provides stereo-image data with agriculturally validated growth stages, enabling precise phenological analysis and 3D reconstructions. Extending MinneApple with our data improves YOLOv8 performance by 7.69 % in terms of F1-score, while adding it to MinneApple and MAD boosts Faster R-CNN F1-score by 31.06 %. Additionally, six BBCH stages were predicted with over 95 % accuracy using VGG16, ResNet152, DenseNet201, and MobileNetv2. AppleGrowthVision bridges the gap between agricultural science and computer vision, by enabling the development of robust models for fruit detection, growth modeling, and 3D analysis in precision agriculture. Future work includes improving annotation, enhancing 3D reconstruction, and extending multimodal analysis across all growth stages.