🤖 AI Summary
Existing approaches for robotic grasping in 3D open-world environments suffer from domain shift and poor generalization of clustering methods when handling cross-vendor cameras and robots. Method: We propose a training-free binary clustering framework that fuses multi-source, heterogeneous 3D point cloud segmentation outputs to achieve unsupervised clustering-based localization and robust grasping of unknown objects. Contribution/Results: Our work introduces the first training-free, plug-and-play paradigm for cross-device 3D point cloud processing, compatible with arbitrary 3D sensors. We design a lightweight binary clustering algorithm that eliminates reliance on prior distribution assumptions or scene-specific constraints. Evaluated across multiple robot platforms, diverse camera models, and cluttered, densely stacked scenes, our method achieves significant zero-shot grasping success rate improvements—demonstrating strong generalizability and deployment efficiency.
📝 Abstract
Robotic grasping in the open world is a critical component of manufacturing and automation processes. While numerous existing approaches depend on 2D segmentation output to facilitate the grasping procedure, accurately determining depth from 2D imagery remains a challenge, often leading to limited performance in complex stacking scenarios. In contrast, techniques utilizing 3D point cloud data inherently capture depth information, thus enabling adeptly navigating and manipulating a diverse range of complex stacking scenes. However, such efforts are considerably hindered by the variance in data capture devices and the unstructured nature of the data, which limits their generalizability. Consequently, much research is narrowly concentrated on managing designated objects within specific settings, which confines their real-world applicability. This paper presents a novel pipeline capable of executing object grasping tasks in open-world scenarios even on previously unseen objects without the necessity for training. Additionally, our pipeline supports the flexible use of different 3D point cloud segmentation models across a variety of scenes. Leveraging the segmentation results, we propose to engage a training-free binary clustering algorithm that not only improves segmentation precision but also possesses the capability to cluster and localize unseen objects for executing grasping operations. In our experiments, we investigate a range of open-world scenarios, and the outcomes underscore the remarkable robustness and generalizability of our pipeline, consistent across various environments, robots, cameras, and objects. The code will be made available upon acceptance of the paper.