🤖 AI Summary
Traditional assessment of grape cluster compactness relies on manual visual scoring, which suffers from high subjectivity, low efficiency, and a lack of berry-level fine-grained data. To address these limitations, this work introduces ViViD-5K, the first large-scale, multi-varietal, field-collected grape image dataset with berry-level annotations comprising 5,000 images. The authors propose GrapeSAM, a two-stage weakly supervised visual pipeline: in the first stage, point annotations guide precise berry localization; in the second, the Segment Anything Model is leveraged with prompt-based segmentation, augmented by a Transformer module to achieve cluster-level segmentation and compactness estimation. The method demonstrates high accuracy and robustness across both in-domain and out-of-domain scenarios, offering an objective, scalable, and automated solution for high-throughput grape phenotyping.
📝 Abstract
Cluster closure, defined as the progressive filling of gaps between the berries in a grape bunch, is a key trait in vineyard management, impacting disease risk. However, traditional visual scoring methods are labor-intensive, subjective, and lack temporal resolution. Existing datasets rarely support fine-grained berry-level analysis, limiting the development of robust deep learning models. In this work, we present ViViD-5k, a large-scale in-field Vineyard Vision Dataset containing 5,000 images with dense annotations, including over 648,000 berry centroids and cluster segmentation masks spanning 13 grape varieties. Building on this dataset, we introduce GrapeSAM, a two-stage visual pipeline that combines point-based berry localization with prompt-based segmentation using Segment Anything, followed by transformer-based cluster segmentation. The pipeline enables automated, in-field estimation of cluster closure with minimal supervision. Quantitative results demonstrate strong segmentation and counting accuracy across diverse conditions, while visualizations confirm robustness on both in-domain and out-of-domain samples. This work provides a scalable and objective alternative to manual compactness scoring and supports high-throughput grape phenotyping with enhanced spatial detail.