Multimodal Anomaly Detection for Human-Robot Interaction

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This work proposes the MADRI framework to enhance safety and reliability in human-robot collaboration by introducing a novel multimodal anomaly detection approach. MADRI uniquely integrates visual semantic features, robotic proprioceptive signals, and scene graphs, leveraging multimodal feature reconstruction for comprehensive anomaly identification. The method encodes video streams into semantic vectors and fuses them with internal sensor data and structured scene representations to jointly model anomalies arising from both external environments and internal system states. Experimental results demonstrate that visual features alone can effectively detect anomalies, while the incorporation of additional modalities significantly improves performance, thereby validating the efficacy of the proposed reconstruction strategy. To support this research, the authors also introduce a custom dataset specifically designed for anomaly detection in human-robot collaborative scenarios.

Technology Category

Application Category

📝 Abstract
Ensuring safety and reliability in human-robot interaction (HRI) requires the timely detection of unexpected events that could lead to system failures or unsafe behaviours. Anomaly detection thus plays a critical role in enabling robots to recognize and respond to deviations from normal operation during collaborative tasks. While reconstruction models have been actively explored in HRI, approaches that operate directly on feature vectors remain largely unexplored. In this work, we propose MADRI, a framework that first transforms video streams into semantically meaningful feature vectors before performing reconstruction-based anomaly detection. Additionally, we augment these visual feature vectors with the robot's internal sensors'readings and a Scene Graph, enabling the model to capture both external anomalies in the visual environment and internal failures within the robot itself. To evaluate our approach, we collected a custom dataset consisting of a simple pick-and-place robotic task under normal and anomalous conditions. Experimental results demonstrate that reconstruction on vision-based feature vectors alone is effective for detecting anomalies, while incorporating other modalities further improves detection performance, highlighting the benefits of multimodal feature reconstruction for robust anomaly detection in human-robot collaboration.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Anomaly Detection
Human-Robot Interaction
Anomaly Detection
Safety
Reconstruction-based Detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal anomaly detection
feature vector reconstruction
human-robot interaction
scene graph
sensor fusion