🤖 AI Summary
Existing methods for robotic kit assembly in industrial automation—requiring fine-grained orientation alignment—suffer from low accuracy and high computational overhead. Method: We propose a novel approach that autonomously learns high-precision grasping and placing policies from minimal human demonstrations. Our method introduces the Rotation-Equivariant Orientation Histogram (EOH), a new representation integrating Fourier-based directional discretization, rotation-equivariant feature learning, and subgroup-aligned EOH compression for matching—unifying high-resolution orientation modeling with compact feature matching. We further design an end-to-end vision-to-action mapping framework. Contribution/Results: On the HTKD simulation dataset, our method significantly outperforms mainstream baselines in both task success rate and inference efficiency. It demonstrates strong generalization across five real-world Raven-10 robotic tasks and has been successfully deployed on physical hardware.
📝 Abstract
Robotic kitting is a critical task in industrial automation that requires the precise arrangement of objects into kits to support downstream production processes. However, when handling complex kitting tasks that involve fine-grained orientation alignment, existing approaches often suffer from limited accuracy and computational efficiency. To address these challenges, we propose Histogram Transporter, a novel kitting framework that learns high-precision pick-and-place actions from scratch using only a few demonstrations. First, our method extracts rotation-equivariant orientation histograms (EOHs) from visual observations using an efficient Fourier-based discretization strategy. These EOHs serve a dual purpose: improving picking efficiency by directly modeling action success probabilities over high-resolution orientations and enhancing placing accuracy by serving as local, discriminative feature descriptors for object-to-placement matching. Second, we introduce a subgroup alignment strategy in the place model that compresses the full spectrum of EOHs into a compact orientation representation, enabling efficient feature matching while preserving accuracy. Finally, we examine the proposed framework on the simulated Hand-Tool Kitting Dataset (HTKD), where it outperforms competitive baselines in both success rates and computational efficiency. Further experiments on five Raven-10 tasks exhibits the remarkable adaptability of our approach, with real-robot trials confirming its applicability for real-world deployment.