RC-NF: Robot-Conditioned Normalizing Flow for Real-Time Anomaly Detection in Robotic Manipulation

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of Vision-Language-Action (VLA) models in out-of-distribution (OOD) dynamic environments and their lack of real-time anomaly detection capabilities. The authors propose an unsupervised anomaly detection method tailored for robotic manipulation, which, for the first time, decouples task-aware robot states and object motion trajectories within a normalizing flow framework. Trained exclusively on positive samples, the method estimates probability densities to compute anomaly scores in real time. It integrates seamlessly into existing VLA systems, enabling timely state rollback or task replanning. Evaluated on the newly introduced LIBERO-Anomaly-10 simulation benchmark, the approach achieves state-of-the-art performance, and real-robot experiments demonstrate a response latency under 100 ms, significantly enhancing the robustness and adaptability of VLA systems in dynamic environments.

Technology Category

Application Category

📝 Abstract
Recent advances in Vision-Language-Action (VLA) models have enabled robots to execute increasingly complex tasks. However, VLA models trained through imitation learning struggle to operate reliably in dynamic environments and often fail under Out-of-Distribution (OOD) conditions. To address this issue, we propose Robot-Conditioned Normalizing Flow (RC-NF), a real-time monitoring model for robotic anomaly detection and intervention that ensures the robot's state and the object's motion trajectory align with the task. RC-NF decouples the processing of task-aware robot and object states within the normalizing flow. It requires only positive samples for unsupervised training and calculates accurate robotic anomaly scores during inference through the probability density function. We further present LIBERO-Anomaly-10, a benchmark comprising three categories of robotic anomalies for simulation evaluation. RC-NF achieves state-of-the-art performance across all anomaly types compared to previous methods in monitoring robotic tasks. Real-world experiments demonstrate that RC-NF operates as a plug-and-play module for VLA models (e.g., pi0), providing a real-time OOD signal that enables state-level rollback or task-level replanning when necessary, with a response latency under 100 ms. These results demonstrate that RC-NF noticeably enhances the robustness and adaptability of VLA-based robotic systems in dynamic environments.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models
Out-of-Distribution
anomaly detection
robotic manipulation
dynamic environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalizing Flow
Anomaly Detection
Robot Manipulation
Out-of-Distribution Detection
Unsupervised Learning
S
Shijie Zhou
Institute of Trustworthy Embodied AI, Fudan University; Shanghai Key Laboratory of Multimodal Embodied AI
Bin Zhu
Bin Zhu
Assistant Professor, Singapore Management University
MultimediaComputer Vision
J
Jiarui Yang
Institute of Trustworthy Embodied AI, Fudan University; Shanghai Key Laboratory of Multimodal Embodied AI
X
Xiangyu Zhao
Institute of Trustworthy Embodied AI, Fudan University; Shanghai Key Laboratory of Multimodal Embodied AI
Jingjing Chen
Jingjing Chen
Fudan University
MultimediaComputer VisionMachine LearningPattern recognition
Yu-Gang Jiang
Yu-Gang Jiang
Professor, Fudan University. IEEE & IAPR Fellow
Video AnalysisEmbodied AITrustworthy AI