Exploring Pose-Guided Imitation Learning for Robotic Precise Insertion

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing imitation learning methods for high-precision robotic insertion suffer from low accuracy, reliance on redundant image/point-cloud observations, and poor sample efficiency. To address these challenges, this paper proposes the first SE(3)-relative-pose-guided imitation learning framework. Our method unifies observation and action representation via SE(3) relative pose; introduces a target-conditioned RGB-D encoder coupled with a pose-guided residual gated fusion mechanism to adaptively integrate geometric priors and visual details; and jointly models policies via diffusion-based trajectory generation and SE(3) pose trajectory prediction. Evaluated on six fine-grained insertion tasks, our approach achieves ≈0.01 mm positioning accuracy using only 7–10 demonstrations—substantially outperforming state-of-the-art methods—while demonstrating strong generalization and unprecedented sample efficiency.

Technology Category

Application Category

📝 Abstract
Recent studies have proved that imitation learning shows strong potential in the field of robotic manipulation. However, existing methods still struggle with precision manipulation task and rely on inefficient image/point cloud observations. In this paper, we explore to introduce SE(3) object pose into imitation learning and propose the pose-guided efficient imitation learning methods for robotic precise insertion task. First, we propose a precise insertion diffusion policy which utilizes the relative SE(3) pose as the observation-action pair. The policy models the source object SE(3) pose trajectory relative to the target object. Second, we explore to introduce the RGBD data to the pose-guided diffusion policy. Specifically, we design a goal-conditioned RGBD encoder to capture the discrepancy between the current state and the goal state. In addition, a pose-guided residual gated fusion method is proposed, which takes pose features as the backbone, and the RGBD features selectively compensate for pose feature deficiencies through an adaptive gating mechanism. Our methods are evaluated on 6 robotic precise insertion tasks, demonstrating competitive performance with only 7-10 demonstrations. Experiments demonstrate that the proposed methods can successfully complete precision insertion tasks with a clearance of about 0.01 mm. Experimental results highlight its superior efficiency and generalization capability compared to existing baselines. Code will be available at https://github.com/sunhan1997/PoseInsert.
Problem

Research questions and friction points this paper is trying to address.

Improving robotic precision insertion using SE(3) pose-guided imitation learning
Enhancing efficiency by integrating RGBD data with pose-guided diffusion policy
Achieving high accuracy (0.01mm clearance) with minimal demonstrations (7-10)
Innovation

Methods, ideas, or system contributions that make the work stand out.

SE(3) pose-guided imitation learning for precision
Goal-conditioned RGBD encoder enhances state discrepancy
Pose-guided residual gated fusion adaptively compensates features
🔎 Similar Papers
H
Han Sun
School of Mechanical Engineering, Shanghai Jiao Tong University
Y
Yizhao Wang
School of Mechanical Engineering, Shanghai Jiao Tong University
Z
Zhenning Zhou
School of Mechanical Engineering, Shanghai Jiao Tong University
S
Shuai Wang
Shanghai Huawei Technologies Co., Ltd.
Haibo Yang
Haibo Yang
Rochester Institute of Technology
Federated LearningOptimizationMachine Learning
Jingyuan Sun
Jingyuan Sun
Assistant Professor, The University of Manchester
neural encoding and decodingbrain machine interfacelarge language models
Q
Qixin Cao
School of Mechanical Engineering, Shanghai Jiao Tong University