CUPID: Curating Data your Robot Loves with Influence Functions

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

In robot imitation learning, demonstration quality critically influences closed-loop policy performance, yet the mechanistic impact of individual samples on task success remains poorly understood. This paper introduces a novel theoretical framework based on influence functions to precisely quantify the contribution of each demonstration trajectory to the policy’s expected return—enabling principled identification of harmful demonstrations and active selection of high-value trajectories. Our method integrates influence function theory, policy gradient analysis for imitation learning, reward estimation, and subset optimization algorithms. We validate it extensively in simulation (RoboMimic) and on real robotic platforms. Experiments demonstrate that policies trained on only ≤33% of carefully curated demonstrations achieve state-of-the-art performance comparable to diffusion-based imitation learning baselines, while exhibiting significantly improved generalization and robustness across diverse tasks and environmental perturbations.

Technology Category

Application Category

📝 Abstract

In robot imitation learning, policy performance is tightly coupled with the quality and composition of the demonstration data. Yet, developing a precise understanding of how individual demonstrations contribute to downstream outcomes - such as closed-loop task success or failure - remains a persistent challenge. We propose CUPID, a robot data curation method based on a novel influence function-theoretic formulation for imitation learning policies. Given a set of evaluation rollouts, CUPID estimates the influence of each training demonstration on the policy's expected return. This enables ranking and selection of demonstrations according to their impact on the policy's closed-loop performance. We use CUPID to curate data by 1) filtering out training demonstrations that harm policy performance and 2) subselecting newly collected trajectories that will most improve the policy. Extensive simulated and hardware experiments show that our approach consistently identifies which data drives test-time performance. For example, training with less than 33% of curated data can yield state-of-the-art diffusion policies on the simulated RoboMimic benchmark, with similar gains observed in hardware. Furthermore, hardware experiments show that our method can identify robust strategies under distribution shift, isolate spurious correlations, and even enhance the post-training of generalist robot policies. Additional materials are made available at: https://cupid-curation.github.io.

Problem

Research questions and friction points this paper is trying to address.

Understanding how individual demonstrations affect robot imitation learning outcomes

Curating robot training data to improve policy performance effectively

Identifying and filtering harmful or beneficial demonstrations for better task success

Innovation

Methods, ideas, or system contributions that make the work stand out.

Influence function-based data curation method

Filters harmful and selects beneficial demonstrations

Enhances policy performance with minimal curated data

🔎 Similar Papers

Interesting Scientific Idea Generation using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders

2024-05-27Citations: 3

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15