🤖 AI Summary
In robot imitation learning, demonstration quality critically influences closed-loop policy performance, yet the mechanistic impact of individual samples on task success remains poorly understood. This paper introduces a novel theoretical framework based on influence functions to precisely quantify the contribution of each demonstration trajectory to the policy’s expected return—enabling principled identification of harmful demonstrations and active selection of high-value trajectories. Our method integrates influence function theory, policy gradient analysis for imitation learning, reward estimation, and subset optimization algorithms. We validate it extensively in simulation (RoboMimic) and on real robotic platforms. Experiments demonstrate that policies trained on only ≤33% of carefully curated demonstrations achieve state-of-the-art performance comparable to diffusion-based imitation learning baselines, while exhibiting significantly improved generalization and robustness across diverse tasks and environmental perturbations.
📝 Abstract
In robot imitation learning, policy performance is tightly coupled with the quality and composition of the demonstration data. Yet, developing a precise understanding of how individual demonstrations contribute to downstream outcomes - such as closed-loop task success or failure - remains a persistent challenge. We propose CUPID, a robot data curation method based on a novel influence function-theoretic formulation for imitation learning policies. Given a set of evaluation rollouts, CUPID estimates the influence of each training demonstration on the policy's expected return. This enables ranking and selection of demonstrations according to their impact on the policy's closed-loop performance. We use CUPID to curate data by 1) filtering out training demonstrations that harm policy performance and 2) subselecting newly collected trajectories that will most improve the policy. Extensive simulated and hardware experiments show that our approach consistently identifies which data drives test-time performance. For example, training with less than 33% of curated data can yield state-of-the-art diffusion policies on the simulated RoboMimic benchmark, with similar gains observed in hardware. Furthermore, hardware experiments show that our method can identify robust strategies under distribution shift, isolate spurious correlations, and even enhance the post-training of generalist robot policies. Additional materials are made available at: https://cupid-curation.github.io.