Failure Prediction at Runtime for Generative Robot Policies

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative robotic policies often exhibit unpredictable failures in unknown environments due to distributional shift and cumulative action errors, compromising safety in human-robot coexistence scenarios. To address this, we propose FIPER—a novel framework that jointly leverages anomaly detection in the policy embedding space and entropy-based uncertainty quantification over action blocks, enabling real-time, failure-free fault prediction. FIPER employs random network distillation to detect anomalous observations, integrates short-term window aggregation with conformal prediction for dynamic threshold calibration, and thereby significantly improves both alerting accuracy and timeliness. Evaluated across five simulated and real-world environments under diverse failure modes, FIPER reliably distinguishes benign deviations from genuine faults, achieving an average 32% earlier warning time and a 41% reduction in false positives—providing a robust safety guarantee for deployment of safety-critical generative robotic systems.

Technology Category

Application Category

📝 Abstract
Imitation learning (IL) with generative models, such as diffusion and flow matching, has enabled robots to perform complex, long-horizon tasks. However, distribution shifts from unseen environments or compounding action errors can still cause unpredictable and unsafe behavior, leading to task failure. Early failure prediction during runtime is therefore essential for deploying robots in human-centered and safety-critical environments. We propose FIPER, a general framework for Failure Prediction at Runtime for generative IL policies that does not require failure data. FIPER identifies two key indicators of impending failure: (i) out-of-distribution (OOD) observations detected via random network distillation in the policy's embedding space, and (ii) high uncertainty in generated actions measured by a novel action-chunk entropy score. Both failure prediction scores are calibrated using a small set of successful rollouts via conformal prediction. A failure alarm is triggered when both indicators, aggregated over short time windows, exceed their thresholds. We evaluate FIPER across five simulation and real-world environments involving diverse failure modes. Our results demonstrate that FIPER better distinguishes actual failures from benign OOD situations and predicts failures more accurately and earlier than existing methods. We thus consider this work an important step towards more interpretable and safer generative robot policies. Code, data and videos are available at https://tum-lsy.github.io/fiper_website.
Problem

Research questions and friction points this paper is trying to address.

Predicting robot policy failures without requiring failure data
Detecting out-of-distribution observations and action uncertainty
Improving safety for generative imitation learning in robots
Innovation

Methods, ideas, or system contributions that make the work stand out.

Random network distillation detects OOD observations
Novel action-chunk entropy measures action uncertainty
Conformal prediction calibrates scores using successful rollouts
🔎 Similar Papers
No similar papers found.
Ralf Römer
Ralf Römer
Technical University of Munich
Machine LearningRoboticsEmbodied AIVLAControl
A
Adrian Kobras
Technical University of Munich, Germany; Learning Systems and Robotics Lab; Munich Institute of Robotics and Machine Intelligence (MIRMI)
L
Luca Worbis
Technical University of Munich, Germany; Learning Systems and Robotics Lab; Munich Institute of Robotics and Machine Intelligence (MIRMI)
A
Angela P. Schoellig
Technical University of Munich, Germany; Learning Systems and Robotics Lab; Munich Institute of Robotics and Machine Intelligence (MIRMI); Robotics Institute Germany; Munich Center for Machine Learning