Sample-Efficient Expert Query Control in Active Imitation Learning via Conformal Prediction

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In active imitation learning (AIL), expert action labeling is prohibitively expensive—especially in GPU-intensive simulation, human-in-the-loop, and robotic fleet settings. To address this, we propose a state-novelty-based active querying method. Our approach introduces a task-agnostic, globally adaptive threshold via conformal prediction, uses K-nearest-neighbor distance as the novelty score, and implements trajectory-level delayed querying via rejection sampling—eliminating the need for real-time expert intervention. The method drastically reduces query frequency while maintaining or exceeding expert-level performance on MuJoCo benchmarks. Compared to DAgger, it reduces expert queries by up to 96%; it outperforms existing AIL methods by up to 65% in query efficiency. Moreover, it exhibits strong robustness to hyperparameter variation, enhancing practical deployability across diverse robotic learning scenarios.

Technology Category

Application Category

📝 Abstract
Active imitation learning (AIL) combats covariate shift by querying an expert during training. However, expert action labeling often dominates the cost, especially in GPU-intensive simulators, human-in-the-loop settings, and robot fleets that revisit near-duplicate states. We present Conformalized Rejection Sampling for Active Imitation Learning (CRSAIL), a querying rule that requests an expert action only when the visited state is under-represented in the expert-labeled dataset. CRSAIL scores state novelty by the distance to the $K$-th nearest expert state and sets a single global threshold via conformal prediction. This threshold is the empirical $(1-α)$ quantile of on-policy calibration scores, providing a distribution-free calibration rule that links $α$ to the expected query rate and makes $α$ a task-agnostic tuning knob. This state-space querying strategy is robust to outliers and, unlike safety-gate-based AIL, can be run without real-time expert takeovers: we roll out full trajectories (episodes) with the learner and only afterward query the expert on a subset of visited states. Evaluated on MuJoCo robotics tasks, CRSAIL matches or exceeds expert-level reward while reducing total expert queries by up to 96% vs. DAgger and up to 65% vs. prior AIL methods, with empirical robustness to $α$ and $K$, easing deployment on novel systems with unknown dynamics.
Problem

Research questions and friction points this paper is trying to address.

Reducing expert query costs in active imitation learning
Managing covariate shift without real-time expert intervention
Achieving sample efficiency while maintaining expert-level performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses conformal prediction to set global novelty threshold
Queries expert only on underrepresented states to reduce cost
Runs full episodes before querying, avoiding real-time takeovers
🔎 Similar Papers
No similar papers found.
A
Arad Firouzkouhi
Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
O
Omid Mirzaeedodangeh
Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
Lars Lindemann
Lars Lindemann
Assistant Professor of Algorithmic Systems Theory, ETH Zürich
Systems and Control TheoryFormal MethodsMachine LearningAutonomous Systems