🤖 AI Summary
Traditional plant phenotyping suffers from observer bias and subjectivity, hindering fine-grained, reproducible quantitative analysis. To address this, we introduce TomatoMAP—the first benchmark dataset for multi-view, multi-pose tomato phenotyping—featuring pixel-level segmentation annotations across seven anatomical regions and fine-grained classification of 50 BBCH growth stages. Methodologically, we propose an end-to-end cascaded framework integrating MobileNetV3 (for stage classification), YOLOv11 (for object detection), and Mask R-CNN (for instance segmentation), jointly leveraging semantic and instance segmentation for granular phenotypic parsing. Experiments demonstrate that our model achieves expert-level performance: classification accuracy and inference speed comparable to five domain experts, with Cohen’s Kappa of 0.89 and high inter-rater consistency validated via rater agreement heatmaps. This work establishes a novel, publicly available data foundation and a reproducible technical paradigm for AI-driven crop phenotyping.
📝 Abstract
Observer bias and inconsistencies in traditional plant phenotyping methods limit the accuracy and reproducibility of fine-grained plant analysis. To overcome these challenges, we developed TomatoMAP, a comprehensive dataset for Solanum lycopersicum using an Internet of Things (IoT) based imaging system with standardized data acquisition protocols. Our dataset contains 64,464 RGB images that capture 12 different plant poses from four camera elevation angles. Each image includes manually annotated bounding boxes for seven regions of interest (ROIs), including leaves, panicle, batch of flowers, batch of fruits, axillary shoot, shoot and whole plant area, along with 50 fine-grained growth stage classifications based on the BBCH scale. Additionally, we provide 3,616 high-resolution image subset with pixel-wise semantic and instance segmentation annotations for fine-grained phenotyping. We validated our dataset using a cascading model deep learning framework combining MobileNetv3 for classification, YOLOv11 for object detection, and MaskRCNN for segmentation. Through AI vs. Human analysis involving five domain experts, we demonstrate that the models trained on our dataset achieve accuracy and speed comparable to the experts. Cohen's Kappa and inter-rater agreement heatmap confirm the reliability of automated fine-grained phenotyping using our approach.