🤖 AI Summary
This work addresses the lack of quantitative metrics for trajectory quality assessment in robot imitation learning. We propose a mutual information–based trajectory evaluation method that jointly models state diversity and action predictability. Our approach innovatively integrates k-nearest-neighbor mutual information estimation with a lightweight VAE-based joint state-action embedding, enabling robust trajectory ranking from small-scale robot demonstration data. To our knowledge, this is the first method validated across both ALOHA and Franka simulation and real-world platforms, achieving strong correlation (ρ > 0.85) with human expert ratings. On the RoboMimic benchmark, policies trained exclusively on trajectories selected by our metric yield 5–10% performance gains; in real-robot tasks, success rates increase significantly. The method establishes an interpretable, cross-platform-consistent paradigm for data curation and active data acquisition in imitation learning.
📝 Abstract
The performance of imitation learning policies often hinges on the datasets with which they are trained. Consequently, investment in data collection for robotics has grown across both industrial and academic labs. However, despite the marked increase in the quantity of demonstrations collected, little work has sought to assess the quality of said data despite mounting evidence of its importance in other areas such as vision and language. In this work, we take a critical step towards addressing the data quality in robotics. Given a dataset of demonstrations, we aim to estimate the relative quality of individual demonstrations in terms of both state diversity and action predictability. To do so, we estimate the average contribution of a trajectory towards the mutual information between states and actions in the entire dataset, which precisely captures both the entropy of the state distribution and the state-conditioned entropy of actions. Though commonly used mutual information estimators require vast amounts of data often beyond the scale available in robotics, we introduce a novel technique based on k-nearest neighbor estimates of mutual information on top of simple VAE embeddings of states and actions. Empirically, we demonstrate that our approach is able to partition demonstration datasets by quality according to human expert scores across a diverse set of benchmarks spanning simulation and real world environments. Moreover, training policies based on data filtered by our method leads to a 5-10% improvement in RoboMimic and better performance on real ALOHA and Franka setups.