🤖 AI Summary
Fruit counting in orchards faces challenges including severe occlusion, semantic ambiguity, and high computational cost of 3D reconstruction. To address these, we propose a language-guided semantic Gaussian splatting method. Our approach introduces the first language-aligned semantic Gaussian representation, enabling zero-shot, prompt-driven 3D instance filtering. By integrating radius-aware pruning, tiled rasterization, and distribution-aware sampling, it achieves real-time rendering (>30 FPS) without sacrificing accuracy. Furthermore, our 3D spatial prompt filtering and density-based clustering support open-set, cross-variety semantic queries. Evaluated on real-world orchard data, our method achieves a counting error of <4.2%, significantly outperforming NeRF-based baselines. To the best of our knowledge, this is the first framework enabling open-vocabulary, semantically controllable, high-accuracy, and real-time 3D fruit counting.
📝 Abstract
Accurate fruit counting in real-world agricultural environments is a longstanding challenge due to visual occlusions, semantic ambiguity, and the high computational demands of 3D reconstruction. Existing methods based on neural radiance fields suffer from low inference speed, limited generalization, and lack support for open-set semantic control. This paper presents FruitLangGS, a real-time 3D fruit counting framework that addresses these limitations through spatial reconstruction, semantic embedding, and language-guided instance estimation. FruitLangGS first reconstructs orchard-scale scenes using an adaptive Gaussian splatting pipeline with radius-aware pruning and tile-based rasterization for efficient rendering. To enable semantic control, each Gaussian encodes a compressed CLIP-aligned language embedding, forming a compact and queryable 3D representation. At inference time, prompt-based semantic filtering is applied directly in 3D space, without relying on image-space segmentation or view-level fusion. The selected Gaussians are then converted into dense point clouds via distribution-aware sampling and clustered to estimate fruit counts. Experimental results on real orchard data demonstrate that FruitLangGS achieves higher rendering speed, semantic flexibility, and counting accuracy compared to prior approaches, offering a new perspective for language-driven, real-time neural rendering across open-world scenarios.