Sample-Efficient Expert Query Control in Active Imitation Learning via Conformal Prediction

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In active imitation learning (AIL), expert action labeling is prohibitively expensive—especially in GPU-intensive simulation, human-in-the-loop, and robotic fleet settings. To address this, we propose a state-novelty-based active querying method. Our approach introduces a task-agnostic, globally adaptive threshold via conformal prediction, uses K-nearest-neighbor distance as the novelty score, and implements trajectory-level delayed querying via rejection sampling—eliminating the need for real-time expert intervention. The method drastically reduces query frequency while maintaining or exceeding expert-level performance on MuJoCo benchmarks. Compared to DAgger, it reduces expert queries by up to 96%; it outperforms existing AIL methods by up to 65% in query efficiency. Moreover, it exhibits strong robustness to hyperparameter variation, enhancing practical deployability across diverse robotic learning scenarios.

Technology Category

Application Category

📝 Abstract

Active imitation learning (AIL) combats covariate shift by querying an expert during training. However, expert action labeling often dominates the cost, especially in GPU-intensive simulators, human-in-the-loop settings, and robot fleets that revisit near-duplicate states. We present Conformalized Rejection Sampling for Active Imitation Learning (CRSAIL), a querying rule that requests an expert action only when the visited state is under-represented in the expert-labeled dataset. CRSAIL scores state novelty by the distance to the $K$-th nearest expert state and sets a single global threshold via conformal prediction. This threshold is the empirical $(1-α)$ quantile of on-policy calibration scores, providing a distribution-free calibration rule that links $α$ to the expected query rate and makes $α$ a task-agnostic tuning knob. This state-space querying strategy is robust to outliers and, unlike safety-gate-based AIL, can be run without real-time expert takeovers: we roll out full trajectories (episodes) with the learner and only afterward query the expert on a subset of visited states. Evaluated on MuJoCo robotics tasks, CRSAIL matches or exceeds expert-level reward while reducing total expert queries by up to 96% vs. DAgger and up to 65% vs. prior AIL methods, with empirical robustness to $α$ and $K$, easing deployment on novel systems with unknown dynamics.

Problem

Research questions and friction points this paper is trying to address.

Reducing expert query costs in active imitation learning

Managing covariate shift without real-time expert intervention

Achieving sample efficiency while maintaining expert-level performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses conformal prediction to set global novelty threshold

Queries expert only on underrepresented states to reduce cost

Runs full episodes before querying, avoiding real-time takeovers

🔎 Similar Papers

No similar papers found.

Authors to Follow