🤖 AI Summary
Manual counting of kernel rows per ear (KRN) in maize is labor-intensive and error-prone, while existing automated methods rely heavily on large-scale annotated datasets and exhibit poor generalization across diverse field conditions. Method: We propose a zero-shot, annotation-free RGB-image-based phenotyping framework that integrates the Segment Anything Model (SAM) for grain segmentation—first applied to agricultural grain analysis—with connected-component analysis, contour fitting, and graph-structured modeling (including topological sorting) to robustly infer grain arrangement topology and precisely count kernel rows without any model training. Contribution/Results: The method achieves high accuracy across multiple maize varieties, ear orientations, and illumination conditions, significantly reducing phenotyping cost and observer bias. All source code is publicly released, establishing a new paradigm for scalable, low-cost, and objective field-level yield component assessment.
📝 Abstract
Quantifying the variation in yield component traits of maize (Zea mays L.), which together determine the overall productivity of this globally important crop, plays a critical role in plant genetics research, plant breeding, and the development of improved farming practices. Grain yield per acre is calculated by multiplying the number of plants per acre, ears per plant, number of kernels per ear, and the average kernel weight. The number of kernels per ear is determined by the number of kernel rows per ear multiplied by the number of kernels per row. Traditional manual methods for measuring these two traits are time-consuming, limiting large-scale data collection. Recent automation efforts using image processing and deep learning encounter challenges such as high annotation costs and uncertain generalizability. We tackle these issues by exploring Large Vision Models for zero-shot, annotation-free maize kernel segmentation. By using an open-source large vision model, the Segment Anything Model (SAM), we segment individual kernels in RGB images of maize ears and apply a graph-based algorithm to calculate the number of kernels per row. Our approach successfully identifies the number of kernels per row across a wide range of maize ears, showing the potential of zero-shot learning with foundation vision models combined with image processing techniques to improve automation and reduce subjectivity in agronomic data collection. All our code is open-sourced to make these affordable phenotyping methods accessible to everyone.