🤖 AI Summary
Graph neural networks (GNNs) for remote sensing image analysis suffer from poor interpretability, high distortion in post-hoc explanations, and insufficient sparsity. Method: We propose a self-explaining GNN framework tailored for vision tasks. Its core innovations are: (1) window-constrained local receptive field graph encoding, balancing long-range dependency modeling with local structural preservation; and (2) a novel self-explaining graph bottleneck mechanism that intrinsically generates faithful, sparse subgraph-level explanations during forward propagation—eliminating post-hoc distortion. The method requires no auxiliary explainer and jointly outputs predictions and their corresponding critical subgraphs. Results: On remote sensing classification and regression benchmarks, our model achieves state-of-the-art performance. It reduces explanation distortion significantly compared to existing Vision GNNs while maintaining >85% subgraph sparsity, thereby enhancing model trustworthiness and practical utility.
📝 Abstract
Deep learning models based on graph neural networks have emerged as a popular approach for solving computer vision problems. They encode the image into a graph structure and can be beneficial for efficiently capturing the long-range dependencies typically present in remote sensing imagery. However, an important drawback of these methods is their black-box nature which may hamper their wider usage in critical applications. In this work, we tackle the self-interpretability of the graph-based vision models by proposing our Interpretable Window Vision GNN (i-WiViG) approach, which provides explanations by automatically identifying the relevant subgraphs for the model prediction. This is achieved with window-based image graph processing that constrains the node receptive field to a local image region and by using a self-interpretable graph bottleneck that ranks the importance of the long-range relations between the image regions. We evaluate our approach to remote sensing classification and regression tasks, showing it achieves competitive performance while providing inherent and faithful explanations through the identified relations. Further, the quantitative evaluation reveals that our model reduces the infidelity of post-hoc explanations compared to other Vision GNN models, without sacrificing explanation sparsity.