🤖 AI Summary
Low utilization efficiency of mixed-quality demonstration data in robotic manipulation hinders the reliability of policy training. To address this, we propose S2I, a segment-level selection and optimization framework that introduces a novel synergistic mechanism combining segmentation, contrastive learning-based filtering, and trajectory optimization. S2I enables efficient reuse of low-quality demonstrations with as few as three high-quality expert trajectories as guidance. It supports plug-and-play imitation learning and is compatible with mainstream behavioral cloning (BC) and generative adversarial imitation learning (GAIL) policies. Evaluated on six simulation and real-robot manipulation tasks, S2I consistently improves downstream policy performance, overcoming the fundamental unreliability of direct training on mixed-quality data. Our approach establishes a new paradigm for cost-effective, high-fidelity robotic data reuse.
📝 Abstract
Data is crucial for robotic manipulation, as it underpins the development of robotic systems for complex tasks. While high-quality, diverse datasets enhance the performance and adaptability of robotic manipulation policies, collecting extensive expert-level data is resource-intensive. Consequently, many current datasets suffer from quality inconsistencies due to operator variability, highlighting the need for methods to utilize mixed-quality data effectively. To mitigate these issues, we propose"Select Segments to Imitate"(S2I), a framework that selects and optimizes mixed-quality demonstration data at the segment level, while ensuring plug-and-play compatibility with existing robotic manipulation policies. The framework has three components: demonstration segmentation dividing origin data into meaningful segments, segment selection using contrastive learning to find high-quality segments, and trajectory optimization to refine suboptimal segments for better policy learning. We evaluate S2I through comprehensive experiments in simulation and real-world environments across six tasks, demonstrating that with only 3 expert demonstrations for reference, S2I can improve the performance of various downstream policies when trained with mixed-quality demonstrations. Project website: https://tonyfang.net/s2i/.