A Multimodal Handover Failure Detection Dataset and Baselines

📅 2024-02-28

🏛️ IEEE International Conference on Robotics and Automation

📈 Citations: 2

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing research on handover failure primarily focuses on sliding or external disturbances, lacking benchmark datasets and evaluation protocols for human-initiated, unavoidable failures (e.g., refusal to accept, failure to release). This work introduces the first multimodal dataset specifically designed for human-led, unavoidable handover failures, along with two baseline methods: video-based classification and joint temporal action segmentation. These enable real-time failure detection and causal attribution on robotic platforms. We innovatively formulate a joint temporal segmentation task that unifies human and robot actions with handover outcomes. Our approach employs 3D CNNs for video modeling, torque signal processing, gripper pose fusion, and multimodal temporal alignment. Experiments demonstrate that video modality is most critical; incorporating force and pose data improves failure detection accuracy by 12.3% and action segmentation mean average precision (mAP) by 9.7%.

Technology Category

Application Category

📝 Abstract

An object handover between a robot and a human is a coordinated action which is prone to failure for reasons such as miscommunication, incorrect actions and unexpected object properties. Existing works on handover failure detection and prevention focus on preventing failures due to object slip or external disturbances. However, there is a lack of datasets and evaluation methods that consider unpreventable failures caused by the human participant. To address this deficit, we present the multimodal Handover Failure Detection dataset, which consists of failures induced by the human participant, such as ignoring the robot or not releasing the object. We also present two baseline methods for handover failure detection: (i) a video classification method using 3D CNNs and (ii) a temporal action segmentation approach which jointly classifies the human action, robot action and overall outcome of the action. The results show that video is an important modality, but using force-torque data and gripper position help improve failure detection and action segmentation accuracy.

Problem

Research questions and friction points this paper is trying to address.

Detects human-induced handover failures in robotics

Addresses lack of multimodal datasets for failure evaluation

Improves accuracy using force-torque and gripper data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal dataset for human-induced handover failures

3D CNNs for video classification baseline

Temporal action segmentation with force-torque data

🔎 Similar Papers

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach