🤖 AI Summary
To address the lack of transparency and interpretability in causal explanations for autonomous driving human–machine interaction, this paper proposes a causal inference framework based on implicit reward modeling. Methodologically, it introduces a learnable reward profile as the core mediator for generating causal explanations, unifying inverse reinforcement learning with structural causal models and integrating multi-task optimization and differentiable causal discovery to enable counterfactual reasoning and semantically interpretable inference over multi-vehicle interactions. Experiments on three real-world driving datasets demonstrate that the method achieves state-of-the-art performance across key evaluation metrics—including explanation fidelity, consistency, and human comprehensibility—significantly outperforming existing baselines.
📝 Abstract
Transparency and explainability are important features that responsible autonomous vehicles should possess, particularly when interacting with humans, and causal reasoning offers a strong basis to provide these qualities. However, even if one assumes agents act to maximise some concept of reward, it is difficult to make accurate causal inferences of agent planning without capturing what is of importance to the agent. Thus our work aims to learn a weighting of reward metrics for agents such that explanations for agent interactions can be causally inferred. We validate our approach quantitatively and qualitatively across three real-world driving datasets, demonstrating a functional improvement over previous methods and competitive performance across evaluation metrics.