🤖 AI Summary
This work addresses the challenge of robotic manipulation in heavily occluded 3D point clouds by proposing an end-to-end Transformer framework that jointly predicts 6-degree-of-freedom poses and task-relevant parameters—such as gripper aperture—for multiple object categories. The approach introduces a pose-aware parametrized perception paradigm, leveraging modular category-specific prediction heads to unify geometric localization and estimable manipulable attributes. This design enables seamless extension to novel object types without architectural redesign. Trained exclusively on synthetic data, the method achieves a mean Average Precision (mAP) of 0.919 on real-world outdoor LiDAR scans and has been successfully deployed on an autonomous forklift platform, demonstrating strong generalization and practical utility in real-world conditions.
📝 Abstract
We present PIRATR, an end-to-end 3D object detection framework for robotic use cases in point clouds. Extending PI3DETR, our method streamlines parametric 3D object detection by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes directly from occlusion-affected point cloud data. This formulation enables not only geometric localization but also the estimation of task-relevant properties for parametric objects, such as a gripper's opening, where the 3D model is adjusted according to simple, predefined rules. The architecture employs modular, class-specific heads, making it straightforward to extend to novel object types without re-designing the pipeline. We validate PIRATR on an automated forklift platform, focusing on three structurally and functionally diverse categories: crane grippers, loading platforms, and pallets. Trained entirely in a synthetic environment, PIRATR generalizes effectively to real outdoor LiDAR scans, achieving a detection mAP of 0.919 without additional fine-tuning. PIRATR establishes a new paradigm of pose-aware, parameterized perception. This bridges the gap between low-level geometric reasoning and actionable world models, paving the way for scalable, simulation-trained perception systems that can be deployed in dynamic robotic environments. Code available at https://github.com/swingaxe/piratr.