🤖 AI Summary
This work addresses the challenge of 6-degree-of-freedom (6DoF) object pose estimation in industrial settings, where training data scarcity and the absence of instance-specific CAD models often hinder performance. Focusing on cuboid-shaped bins, the authors propose a lightweight pose estimation method that operates without requiring CAD models. They extend the 2D line segment detection network LeTR to structured point clouds for 3D line detection and leverage geometric constraints from top-edge features to robustly recover the full 6DoF pose. By incorporating synthetic data augmentation, the method achieves a translation error of 3 cm and a rotation error of 8.2° on real-world scanned data, significantly outperforming existing approaches. The study also introduces a newly annotated dataset to support future research in this domain.
📝 Abstract
The task of 6DoF object pose estimation is one of the fundamental problems of 3D vision with many practical applications such as industrial automation. Traditional deep learning approaches for this task often require extensive training data or CAD models, limiting their application in real-world industrial settings where data is scarce and object instances vary. We propose a novel method for 6DoF pose estimation focused specifically on bins used in industrial settings. We exploit the cuboid geometry of bins by first detecting intermediate 3D line segments corresponding to their top edges. Our approach extends the 2D line segment detection network LeTR to operate on structured point cloud data. The detected 3D line segments are then processed using a simple geometric procedure to robustly determine the bin's 6DoF pose. To evaluate our method, we extend an existing dataset with a newly collected and annotated dataset, which we make publicly available. We show that incorporating synthetic training data significantly improves pose estimation accuracy on real scans. Moreover, we show that our method significantly outperforms current state-of-the-art 6DoF pose estimation methods in terms of the pose accuracy (3 cm translation error, 8.2$^\circ$ rotation error) while not requiring instance-specific CAD models during inference.