🤖 AI Summary
Existing imitation learning methods for high-precision robotic insertion suffer from low accuracy, reliance on redundant image/point-cloud observations, and poor sample efficiency. To address these challenges, this paper proposes the first SE(3)-relative-pose-guided imitation learning framework. Our method unifies observation and action representation via SE(3) relative pose; introduces a target-conditioned RGB-D encoder coupled with a pose-guided residual gated fusion mechanism to adaptively integrate geometric priors and visual details; and jointly models policies via diffusion-based trajectory generation and SE(3) pose trajectory prediction. Evaluated on six fine-grained insertion tasks, our approach achieves ≈0.01 mm positioning accuracy using only 7–10 demonstrations—substantially outperforming state-of-the-art methods—while demonstrating strong generalization and unprecedented sample efficiency.
📝 Abstract
Recent studies have proved that imitation learning shows strong potential in the field of robotic manipulation. However, existing methods still struggle with precision manipulation task and rely on inefficient image/point cloud observations. In this paper, we explore to introduce SE(3) object pose into imitation learning and propose the pose-guided efficient imitation learning methods for robotic precise insertion task. First, we propose a precise insertion diffusion policy which utilizes the relative SE(3) pose as the observation-action pair. The policy models the source object SE(3) pose trajectory relative to the target object. Second, we explore to introduce the RGBD data to the pose-guided diffusion policy. Specifically, we design a goal-conditioned RGBD encoder to capture the discrepancy between the current state and the goal state. In addition, a pose-guided residual gated fusion method is proposed, which takes pose features as the backbone, and the RGBD features selectively compensate for pose feature deficiencies through an adaptive gating mechanism. Our methods are evaluated on 6 robotic precise insertion tasks, demonstrating competitive performance with only 7-10 demonstrations. Experiments demonstrate that the proposed methods can successfully complete precision insertion tasks with a clearance of about 0.01 mm. Experimental results highlight its superior efficiency and generalization capability compared to existing baselines. Code will be available at https://github.com/sunhan1997/PoseInsert.