🤖 AI Summary
Robots frequently fail in real-world unstructured environments, generating vast volumes of unlabeled, semantically opaque failure logs that hinder systematic analysis and improvement.
Method: This paper introduces the first multimodal large language model (MLLM)-based semantic parsing framework for unsupervised robotic failure logs. It integrates perception trajectory encoding with unsupervised cross-modal semantic clustering to automatically discover interpretable, high-level failure root-cause categories, establishing a closed loop of “failure identification → online detection → policy optimization.” Crucially, it pioneers MLLM-based unsupervised failure semantic modeling—eliminating reliance on manual annotations or handcrafted rules—and incorporates failure-driven data acquisition alongside a lightweight online anomaly detection mechanism.
Results: Evaluated across multiple robotic platforms, the framework achieves a 32% improvement in failure identification accuracy and reduces policy iteration cycles by 4.8×, enabling real-time, adaptive safety interventions.
📝 Abstract
As robotic systems become increasingly integrated into real-world environments, ranging from autonomous vehicles to household assistants, they inevitably encounter diverse and unstructured scenarios that lead to failures. While such failures pose safety and reliability challenges, they also provide rich perceptual data for improving future performance. However, manually analyzing large-scale failure datasets is impractical. In this work, we present a method for automatically organizing large-scale robotic failure data into semantically meaningful clusters, enabling scalable learning from failure without human supervision. Our approach leverages the reasoning capabilities of Multimodal Large Language Models (MLLMs), trained on internet-scale data, to infer high-level failure causes from raw perceptual trajectories and discover interpretable structure within uncurated failure logs. These semantic clusters reveal latent patterns and hypothesized causes of failure, enabling scalable learning from experience. We demonstrate that the discovered failure modes can guide targeted data collection for policy refinement, accelerating iterative improvement in agent policies and overall safety. Additionally, we show that these semantic clusters can be employed for online failure detection, offering a lightweight yet powerful safeguard for real-time adaptation. We demonstrate that this framework enhances robot learning and robustness by transforming real-world failures into actionable and interpretable signals for adaptation.