DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficiently selecting task-relevant subsets from large-scale prior datasets remains challenging in robot imitation learning. Method: This paper proposes a strategy-driven, end-to-end data selection paradigm that optimizes for task success rate. It leverages gradient analysis of policy networks and a task-specific surrogate loss—computationally efficient and rollout-free—to dynamically identify data points that either improve or degrade performance. Built upon the DataModels framework, the approach supports both simulation and real-world deployment. Results: Evaluated across 60+ simulated and real-world manipulation tasks, it significantly improves task success rates and consistently outperforms multiple baselines on the Open X-Embodiment dataset. Its core contribution is the first formulation of data selection as a differentiable, task-directed policy optimization problem, enabling automatic, objective assessment of data quality.

Technology Category

Application Category

📝 Abstract
Recently, the robotics community has amassed ever larger and more diverse datasets to train generalist robot policies. However, while these policies achieve strong mean performance across a variety of tasks, they often underperform on individual, specialized tasks and require further tuning on newly acquired task-specific data. Combining task-specific data with carefully curated subsets of large prior datasets via co-training can produce better specialized policies, but selecting data naively may actually harm downstream performance. To address this, we introduce DataMIL, a policy-driven data selection framework built on the datamodels paradigm that reasons about data selection in an end-to-end manner, using the policy itself to identify which data points will most improve performance. Unlike standard practices that filter data using human notions of quality (e.g., based on semantic or visual similarity), DataMIL directly optimizes data selection for task success, allowing us to select data that enhance the policy while dropping data that degrade it. To avoid performing expensive rollouts in the environment during selection, we use a novel surrogate loss function on task-specific data, allowing us to use DataMIL in the real world without degrading performance. We validate our approach on a suite of more than 60 simulation and real-world manipulation tasks - most notably showing successful data selection from the Open X-Embodiment datasets-demonstrating consistent gains in success rates and superior performance over multiple baselines. Our results underscore the importance of end-to-end, performance-aware data selection for unlocking the potential of large prior datasets in robotics. More information at https://robin-lab.cs.utexas.edu/datamodels4imitation/
Problem

Research questions and friction points this paper is trying to address.

Optimizing data selection for robot imitation learning
Improving task-specific performance via curated datasets
Avoiding performance degradation with policy-driven data filtering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy-driven data selection framework DataMIL
End-to-end optimization for task success
Surrogate loss function avoids expensive rollouts