Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low 3D segmentation accuracy and cross-view semantic inconsistency caused by severe occlusion and large scale variations in industrial point clouds, this paper proposes a top-down, two-stage hierarchical image-guided segmentation framework. In the first stage, instance-level coarse segmentation is achieved via multi-view rendering coupled with YOLO-World and SAM. In the second stage, fine-grained refinement is performed through point cloud back-projection and part-level Bayesian fusion. A novel multi-view Bayesian update mechanism is introduced to significantly improve cross-view consistency and boundary precision. Crucially, the method requires no dense 3D annotations—only inexpensive 2D image supervision is needed. Evaluated on a real-world factory dataset, it achieves consistent mIoU improvements across all categories. Furthermore, strong generalization and robustness are validated on public benchmarks.

Technology Category

Application Category

📝 Abstract
Reliable 3D segmentation is critical for understanding complex scenes with dense layouts and multi-scale objects, as commonly seen in industrial environments. In such scenarios, heavy occlusion weakens geometric boundaries between objects, and large differences in object scale will cause end-to-end models fail to capture both coarse and fine details accurately. Existing 3D point-based methods require costly annotations, while image-guided methods often suffer from semantic inconsistencies across views. To address these challenges, we propose a hierarchical image-guided 3D segmentation framework that progressively refines segmentation from instance-level to part-level. Instance segmentation involves rendering a top-view image and projecting SAM-generated masks prompted by YOLO-World back onto the 3D point cloud. Part-level segmentation is subsequently performed by rendering multi-view images of each instance obtained from the previous stage and applying the same 2D segmentation and back-projection process at each view, followed by Bayesian updating fusion to ensure semantic consistency across views. Experiments on real-world factory data demonstrate that our method effectively handles occlusion and structural complexity, achieving consistently high per-class mIoU scores. Additional evaluations on public dataset confirm the generalization ability of our framework, highlighting its robustness, annotation efficiency, and adaptability to diverse 3D environments.
Problem

Research questions and friction points this paper is trying to address.

Addresses occlusion and scale issues in industrial 3D segmentation
Reduces costly annotations in 3D point cloud segmentation
Ensures semantic consistency across multi-view image guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical image-guided 3D segmentation framework
Multi-view Bayesian fusion for semantic consistency
SAM and YOLO-World for 2D mask generation
🔎 Similar Papers
No similar papers found.