🤖 AI Summary
Current autonomous driving scene understanding methods suffer from insufficient behavioral interpretability and incomplete spatiotemporal relationship modeling. Method: We propose a holistic graph-structured reasoning framework that integrates Qualitative eXplainable Graphs (QXGs) with Graph Neural Networks (GNNs). For the first time, QXGs are deeply embedded into GNNs to construct scene-level spatiotemporal graphs, explicitly encoding multi-hop, multi-type qualitative spatial and causal relations among traffic agents—overcoming the limitation of conventional single-relation-chain modeling. Additionally, an imbalance-robust learning mechanism is introduced to enhance critical object recognition accuracy. Contribution/Results: Evaluated on the nuScenes+DriveLM joint dataset, our method significantly outperforms baselines, achieving both high recognition performance and end-to-end causal interpretability—thereby unifying accuracy and explainability in autonomous driving perception.
📝 Abstract
This paper investigates the integration of graph neural networks (GNNs) with Qualitative Explainable Graphs (QXGs) for scene understanding in automated driving. Scene understanding is the basis for any further reactive or proactive decision-making. Scene understanding and related reasoning is inherently an explanation task: why is another traffic participant doing something, what or who caused their actions? While previous work demonstrated QXGs' effectiveness using shallow machine learning models, these approaches were limited to analysing single relation chains between object pairs, disregarding the broader scene context. We propose a novel GNN architecture that processes entire graph structures to identify relevant objects in traffic scenes. We evaluate our method on the nuScenes dataset enriched with DriveLM's human-annotated relevance labels. Experimental results show that our GNN-based approach achieves superior performance compared to baseline methods. The model effectively handles the inherent class imbalance in relevant object identification tasks while considering the complete spatial-temporal relationships between all objects in the scene. Our work demonstrates the potential of combining qualitative representations with deep learning approaches for explainable scene understanding in autonomous driving systems.