Failure Identification in Imitation Learning Via Statistical and Semantic Filtering

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of failure detection in robot imitation learning under real-world deployment, where rare events—such as hardware malfunctions or out-of-distribution states—can disrupt performance, yet existing anomaly detection methods struggle to distinguish harmful failures from benign deviations. To this end, the authors propose FIDeL, a policy-agnostic failure identification module that uniquely integrates statistical anomaly detection with semantic filtering powered by vision-language models. FIDeL further introduces an optimal transport–based trajectory alignment mechanism and spatiotemporally adaptive conformal prediction thresholds. The study also presents BotFails, the first multimodal dataset specifically curated for robotic failure detection. Experimental results demonstrate that FIDeL achieves a 5.30% improvement in AUROC for anomaly detection and a 17.38% gain in failure identification accuracy on BotFails, significantly outperforming current state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Imitation learning (IL) policies in robotics deliver strong performance in controlled settings but remain brittle in real-world deployments: rare events such as hardware faults, defective parts, unexpected human actions, or any state that lies outside the training distribution can lead to failed executions. Vision-based Anomaly Detection (AD) methods emerged as an appropriate solution to detect these anomalous failure states but do not distinguish failures from benign deviations. We introduce FIDeL (Failure Identification in Demonstration Learning), a policy-independent failure detection module. Leveraging recent AD methods, FIDeL builds a compact representation of nominal demonstrations and aligns incoming observations via optimal transport matching to produce anomaly scores and heatmaps. Spatio-temporal thresholds are derived with an extension of conformal prediction, and a Vision-Language Model (VLM) performs semantic filtering to discriminate benign anomalies from genuine failures. We also introduce BotFails, a multimodal dataset of real-world tasks for failure detection in robotics. FIDeL consistently outperforms state-of-the-art baselines, yielding +5.30% percent AUROC in anomaly detection and +17.38% percent failure-detection accuracy on BotFails compared to existing methods.

Problem

Research questions and friction points this paper is trying to address.

Imitation Learning

Failure Detection

Anomaly Detection

Robotics

Out-of-Distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Failure Identification

Imitation Learning

Vision-Language Model