🤖 AI Summary
Existing fruit counting methods (e.g., FruitNeRF) rely on category-specific modeling, suffering from poor generalizability and high deployment overhead. This work proposes the first shape-agnostic, universal multi-fruit counting framework for unstructured orchard imagery—requiring no per-category model adaptation. Our core innovation is the first integration of SAM-generated instance masks with Neural Instance Fields (NIF), augmented by contrastive learning, voxelized point cloud embedding, and hierarchical clustering to enable fruit-agnostic joint 3D geometric–semantic modeling. Evaluated on six synthetic fruit categories and real-world apple images, our method significantly outperforms state-of-the-art approaches, demonstrating strong cross-category generalization and end-to-end controllability.
📝 Abstract
We introduce FruitNeRF++, a novel fruit-counting approach that combines contrastive learning with neural radiance fields to count fruits from unstructured input photographs of orchards. Our work is based on FruitNeRF, which employs a neural semantic field combined with a fruit-specific clustering approach. The requirement for adaptation for each fruit type limits the applicability of the method, and makes it difficult to use in practice. To lift this limitation, we design a shape-agnostic multi-fruit counting framework, that complements the RGB and semantic data with instance masks predicted by a vision foundation model. The masks are used to encode the identity of each fruit as instance embeddings into a neural instance field. By volumetrically sampling the neural fields, we extract a point cloud embedded with the instance features, which can be clustered in a fruit-agnostic manner to obtain the fruit count. We evaluate our approach using a synthetic dataset containing apples, plums, lemons, pears, peaches, and mangoes, as well as a real-world benchmark apple dataset. Our results demonstrate that FruitNeRF++ is easier to control and compares favorably to other state-of-the-art methods.