CountingFruit: Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Fruit counting in orchards faces challenges including severe occlusion, semantic ambiguity, and high computational cost of 3D reconstruction. To address these, we propose a language-guided semantic Gaussian splatting method. Our approach introduces the first language-aligned semantic Gaussian representation, enabling zero-shot, prompt-driven 3D instance filtering. By integrating radius-aware pruning, tiled rasterization, and distribution-aware sampling, it achieves real-time rendering (>30 FPS) without sacrificing accuracy. Furthermore, our 3D spatial prompt filtering and density-based clustering support open-set, cross-variety semantic queries. Evaluated on real-world orchard data, our method achieves a counting error of <4.2%, significantly outperforming NeRF-based baselines. To the best of our knowledge, this is the first framework enabling open-vocabulary, semantically controllable, high-accuracy, and real-time 3D fruit counting.

Technology Category

Application Category

📝 Abstract

Accurate fruit counting in real-world agricultural environments is a longstanding challenge due to visual occlusions, semantic ambiguity, and the high computational demands of 3D reconstruction. Existing methods based on neural radiance fields suffer from low inference speed, limited generalization, and lack support for open-set semantic control. This paper presents FruitLangGS, a real-time 3D fruit counting framework that addresses these limitations through spatial reconstruction, semantic embedding, and language-guided instance estimation. FruitLangGS first reconstructs orchard-scale scenes using an adaptive Gaussian splatting pipeline with radius-aware pruning and tile-based rasterization for efficient rendering. To enable semantic control, each Gaussian encodes a compressed CLIP-aligned language embedding, forming a compact and queryable 3D representation. At inference time, prompt-based semantic filtering is applied directly in 3D space, without relying on image-space segmentation or view-level fusion. The selected Gaussians are then converted into dense point clouds via distribution-aware sampling and clustered to estimate fruit counts. Experimental results on real orchard data demonstrate that FruitLangGS achieves higher rendering speed, semantic flexibility, and counting accuracy compared to prior approaches, offering a new perspective for language-driven, real-time neural rendering across open-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Real-time 3D fruit counting in occluded agricultural environments

Overcoming slow inference and limited generalization in existing methods

Enabling open-set semantic control via language-guided reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Gaussian splatting for efficient 3D reconstruction

CLIP-aligned semantic embedding for open-set control

Prompt-based 3D semantic filtering without segmentation

🔎 Similar Papers

CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding