CountingFruit: Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fruit counting in orchards faces challenges including severe occlusion, semantic ambiguity, and high computational cost of 3D reconstruction. To address these, we propose a language-guided semantic Gaussian splatting method. Our approach introduces the first language-aligned semantic Gaussian representation, enabling zero-shot, prompt-driven 3D instance filtering. By integrating radius-aware pruning, tiled rasterization, and distribution-aware sampling, it achieves real-time rendering (>30 FPS) without sacrificing accuracy. Furthermore, our 3D spatial prompt filtering and density-based clustering support open-set, cross-variety semantic queries. Evaluated on real-world orchard data, our method achieves a counting error of <4.2%, significantly outperforming NeRF-based baselines. To the best of our knowledge, this is the first framework enabling open-vocabulary, semantically controllable, high-accuracy, and real-time 3D fruit counting.

Technology Category

Application Category

📝 Abstract
Accurate fruit counting in real-world agricultural environments is a longstanding challenge due to visual occlusions, semantic ambiguity, and the high computational demands of 3D reconstruction. Existing methods based on neural radiance fields suffer from low inference speed, limited generalization, and lack support for open-set semantic control. This paper presents FruitLangGS, a real-time 3D fruit counting framework that addresses these limitations through spatial reconstruction, semantic embedding, and language-guided instance estimation. FruitLangGS first reconstructs orchard-scale scenes using an adaptive Gaussian splatting pipeline with radius-aware pruning and tile-based rasterization for efficient rendering. To enable semantic control, each Gaussian encodes a compressed CLIP-aligned language embedding, forming a compact and queryable 3D representation. At inference time, prompt-based semantic filtering is applied directly in 3D space, without relying on image-space segmentation or view-level fusion. The selected Gaussians are then converted into dense point clouds via distribution-aware sampling and clustered to estimate fruit counts. Experimental results on real orchard data demonstrate that FruitLangGS achieves higher rendering speed, semantic flexibility, and counting accuracy compared to prior approaches, offering a new perspective for language-driven, real-time neural rendering across open-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Real-time 3D fruit counting in occluded agricultural environments
Overcoming slow inference and limited generalization in existing methods
Enabling open-set semantic control via language-guided reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Gaussian splatting for efficient 3D reconstruction
CLIP-aligned semantic embedding for open-set control
Prompt-based 3D semantic filtering without segmentation
🔎 Similar Papers
No similar papers found.