EasyInsert: A Data-Efficient and Generalizable Insertion Policy

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for high-precision plug-in tasks in cluttered environments suffer from poor generalization, heavy reliance on CAD models or large-scale annotated datasets, and limited adaptability to distant initial poses, novel objects, and real-world conditions. This paper proposes an end-to-end lightweight insertion paradigm wherein the relative pose (delta pose) serves as the sole control signal. Our approach integrates a vision–pose encoder with a self-supervised delta pose regression network, employs a coarse-to-fine two-stage controller, and incorporates an automated real-world data collection system. The method achieves zero-shot generalization—requiring only a single human demonstration and five hours of real-world data for new task adaptation. It attains >90% zero-shot insertion success across 15 unseen plug types (including USB-C, HDMI, and Ethernet). With minimal fine-tuning—just one demonstration plus four minutes of automated data collection—it generalizes to all target objects.

Technology Category

Application Category

📝 Abstract
Insertion task is highly challenging that requires robots to operate with exceptional precision in cluttered environments. Existing methods often have poor generalization capabilities. They typically function in restricted and structured environments, and frequently fail when the plug and socket are far apart, when the scene is densely cluttered, or when handling novel objects. They also rely on strong assumptions such as access to CAD models or a digital twin in simulation. To address this, we propose EasyInsert, a framework which leverages the human intuition that relative pose (delta pose) between plug and socket is sufficient for successful insertion, and employs efficient and automated real-world data collection with minimal human labor to train a generalizable model for relative pose prediction. During execution, EasyInsert follows a coarse-to-fine execution procedure based on predicted delta pose, and successfully performs various insertion tasks. EasyInsert demonstrates strong zero-shot generalization capability for unseen objects in cluttered environments, handling cases with significant initial pose deviations while maintaining high sample efficiency and requiring little human effort. In real-world experiments, with just 5 hours of training data, EasyInsert achieves over 90% success in zero-shot insertion for 13 out of 15 unseen novel objects, including challenging objects like Type-C cables, HDMI cables, and Ethernet cables. Furthermore, with only one human demonstration and 4 minutes of automatically collected data for fine-tuning, it reaches over 90% success rate for all 15 objects.
Problem

Research questions and friction points this paper is trying to address.

Robotic insertion tasks lack generalization in cluttered environments
Existing methods fail with novel objects or large pose deviations
High reliance on CAD models or simulation limits practicality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses delta pose prediction for insertion tasks
Employs automated real-world data collection
Achieves high zero-shot generalization success
🔎 Similar Papers
G
Guanghe Li
Tsinghua University, Shanghai AI Laboratory, Shanghai Qi Zhi Institute, Jilin University
Junming Zhao
Junming Zhao
Nanjing University
Metamaterial metasurface
Shengjie Wang
Shengjie Wang
Tsinghua University
RoboticsReinforcement learningBionic robotics
Y
Yang Gao
Tsinghua University, Shanghai AI Laboratory, Shanghai Qi Zhi Institute