🤖 AI Summary
To address the lack of interpretability and trustworthiness in robotic decision-making caused by the “black-box” nature of neural networks, this paper proposes a concept-level trustworthy explanation method tailored to robotic tasks. The approach maps internal neural activations to human-understandable high-level semantic concepts and generates post-hoc explanations via concept activation mapping and visualization alignment. Crucially, it integrates uncertainty modeling to quantify explanation confidence—marking the first such incorporation in robotic XAI. Unlike existing XAI methods designed for NLP or computer vision, our framework is specifically engineered for foundational robotic decision tasks, jointly optimizing semantic interpretability and trust assessment. Evaluations across diverse simulated and real-world robotic platforms demonstrate significant improvements in human comprehensibility and diagnostic utility of explanations. This work establishes the first explainability framework for robotic learning systems that unifies concept-level semantics with quantitative trust calibration.
📝 Abstract
Black box neural networks are an indispensable part of modern robots. Nevertheless, deploying such high-stakes systems in real-world scenarios poses significant challenges when the stakeholders, such as engineers and legislative bodies, lack insights into the neural networks' decision-making process. Presently, explainable AI is primarily tailored to natural language processing and computer vision, falling short in two critical aspects when applied in robots: grounding in decision-making tasks and the ability to assess trustworthiness of their explanations. In this paper, we introduce a trustworthy explainable robotics technique based on human-interpretable, high-level concepts that attribute to the decisions made by the neural network. Our proposed technique provides explanations with associated uncertainty scores by matching neural network's activations with human-interpretable visualizations. To validate our approach, we conducted a series of experiments with various simulated and real-world robot decision-making models, demonstrating the effectiveness of the proposed approach as a post-hoc, human-friendly robot learning diagnostic tool.