🤖 AI Summary
In interactive imitation learning, heavy annotation burden on human teachers and underutilization of novice policy information remain key challenges. To address these, we propose a lightweight teaching framework integrating active querying with experience replay. Our method introduces three core innovations: (1) S-Aware Gating, which dynamically adjusts query thresholds based on action uncertainty estimated from the novice policy; (2) FIER (Failure-Informed Experience Reconstruction), converting novice planning traces into structured demonstrations; and (3) PIER (Priority-based Imperfect Experience Replay), prioritizing replay of high-uncertainty failure trajectories. The approach unifies uncertainty estimation, language-conditioned control, and experience restructuring. Evaluated on both simulation and real-robot platforms, it reduces human queries by 42%, improves task success rate by 27%, and accelerates cross-domain generalization by 1.8× compared to baselines—significantly lowering annotation dependency while enhancing policy robustness.
📝 Abstract
Human teaching effort is a significant bottleneck for the broader applicability of interactive imitation learning. To reduce the number of required queries, existing methods employ active learning to query the human teacher only in uncertain, risky, or novel situations. However, during these queries, the novice's planned actions are not utilized despite containing valuable information, such as the novice's capabilities, as well as corresponding uncertainty levels. To this end, we allow the novice to say: "I plan to do this, but I am uncertain." We introduce the Active Skill-level Data Aggregation (ASkDAgger) framework, which leverages teacher feedback on the novice plan in three key ways: (1) S-Aware Gating (SAG): Adjusts the gating threshold to track sensitivity, specificity, or a minimum success rate; (2) Foresight Interactive Experience Replay (FIER), which recasts valid and relabeled novice action plans into demonstrations; and (3) Prioritized Interactive Experience Replay (PIER), which prioritizes replay based on uncertainty, novice success, and demonstration age. Together, these components balance query frequency with failure incidence, reduce the number of required demonstration annotations, improve generalization, and speed up adaptation to changing domains. We validate the effectiveness of ASkDAgger through language-conditioned manipulation tasks in both simulation and real-world environments. Code, data, and videos are available at https://askdagger.github.io.