🤖 AI Summary
This work addresses the vulnerability of Vision-Language-Action (VLA) models in out-of-distribution (OOD) dynamic environments and their lack of real-time anomaly detection capabilities. The authors propose an unsupervised anomaly detection method tailored for robotic manipulation, which, for the first time, decouples task-aware robot states and object motion trajectories within a normalizing flow framework. Trained exclusively on positive samples, the method estimates probability densities to compute anomaly scores in real time. It integrates seamlessly into existing VLA systems, enabling timely state rollback or task replanning. Evaluated on the newly introduced LIBERO-Anomaly-10 simulation benchmark, the approach achieves state-of-the-art performance, and real-robot experiments demonstrate a response latency under 100 ms, significantly enhancing the robustness and adaptability of VLA systems in dynamic environments.
📝 Abstract
Recent advances in Vision-Language-Action (VLA) models have enabled robots to execute increasingly complex tasks. However, VLA models trained through imitation learning struggle to operate reliably in dynamic environments and often fail under Out-of-Distribution (OOD) conditions. To address this issue, we propose Robot-Conditioned Normalizing Flow (RC-NF), a real-time monitoring model for robotic anomaly detection and intervention that ensures the robot's state and the object's motion trajectory align with the task. RC-NF decouples the processing of task-aware robot and object states within the normalizing flow. It requires only positive samples for unsupervised training and calculates accurate robotic anomaly scores during inference through the probability density function. We further present LIBERO-Anomaly-10, a benchmark comprising three categories of robotic anomalies for simulation evaluation. RC-NF achieves state-of-the-art performance across all anomaly types compared to previous methods in monitoring robotic tasks. Real-world experiments demonstrate that RC-NF operates as a plug-and-play module for VLA models (e.g., pi0), providing a real-time OOD signal that enables state-level rollback or task-level replanning when necessary, with a response latency under 100 ms. These results demonstrate that RC-NF noticeably enhances the robustness and adaptability of VLA-based robotic systems in dynamic environments.