🤖 AI Summary
This work addresses the insufficient reliability of learned robotic policies during deployment, which stems from distribution shifts, error accumulation, and task complexity. To mitigate these challenges, the authors propose a suite of deployment-time enhancement mechanisms: a runtime behavioral consistency monitor that enables failure prediction without requiring failure data, an influence-function-based, data-centric explainability framework that traces policy performance back to individual training samples, and a language-specified task execution module integrating success-rate estimation with feasibility-aware planning. Together, these components substantially improve robustness and success rates in long-horizon and language-guided tasks, while also enabling interpretable policy diagnostics and targeted dataset refinement.
📝 Abstract
Recent advances in learning-based robot manipulation have produced policies with remarkable capabilities. Yet, reliability at deployment remains a fundamental barrier to real-world use, where distribution shift, compounding errors, and complex task dependencies collectively undermine system performance. This dissertation investigates how the reliability of learned robot policies can be improved at deployment time through mechanisms that operate around them. We develop three complementary classes of deployment-time mechanisms. First, we introduce runtime monitoring methods that detect impending failures by identifying inconsistencies in closed-loop policy behavior and deviations in task progress, without requiring failure data or task-specific supervision. Second, we propose a data-centric framework for policy interpretability that traces deployment-time successes and failures to influential training demonstrations using influence functions, enabling principled diagnosis and dataset curation. Third, we address reliable long-horizon task execution by formulating policy coordination as the problem of estimating and maximizing the success probability of behavior sequences, and we extend this formulation to open-ended, language-specified tasks through feasibility-aware task planning. By centering on core challenges of deployment, these contributions advance practical foundations for the reliable, real-world use of learned robot policies. Continued progress on these foundations will be essential for enabling trustworthy and scalable robot autonomy in the future.