🤖 AI Summary
Current apple harvesting robots lack precise maturity and size estimation capabilities, leading to non-selective harvesting. To address this, we propose an RGB-D multimodal joint estimation framework for selective harvesting. Our method innovatively integrates Grounding-DINO for zero-shot apple detection and binary ripeness classification (ripe vs. unripe), and systematically evaluates six size estimation algorithms within a unified end-to-end assessment pipeline. We introduce the first public Fuji apple dataset featuring fine-grained ripeness grading, temporal annotations, and ground-truth 3D size measurements—comprising 4,027 images and 16,257 annotated instances. Experiments demonstrate state-of-the-art performance: superior detection robustness, highest ripeness classification accuracy, and optimal size estimation (lowest mean error and variance). Both source code and the dataset are fully open-sourced.
📝 Abstract
Harvesting is a critical task in the tree fruit industry, demanding extensive manual labor and substantial costs, and exposing workers to potential hazards. Recent advances in automated harvesting offer a promising solution by enabling efficient, cost-effective, and ergonomic fruit picking within tight harvesting windows. However, existing harvesting technologies often indiscriminately harvest all visible and accessible fruits, including those that are unripe or undersized. This study introduces a novel foundation model-based framework for efficient apple ripeness and size estimation. Specifically, we curated two public RGBD-based Fuji apple image datasets, integrating expanded annotations for ripeness ("Ripe"vs."Unripe") based on fruit color and image capture dates. The resulting comprehensive dataset, Fuji-Ripeness-Size Dataset, includes 4,027 images and 16,257 annotated apples with ripeness and size labels. Using Grounding-DINO, a language-model-based object detector, we achieved robust apple detection and ripeness classification, outperforming other state-of-the-art models. Additionally, we developed and evaluated six size estimation algorithms, selecting the one with the lowest error and variation for optimal performance. The Fuji-Ripeness-Size Dataset and the apple detection and size estimation algorithms are made publicly available, which provides valuable benchmarks for future studies in automated and selective harvesting.