On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the applicability of explainable AI (xAI) methods and evaluation metrics for remote sensing image scene classification. We systematically evaluate ten mainstream explanation techniques—spanning five categories (Occlusion, LIME, Grad-CAM, LRP, DeepLIFT)—alongside five evaluation dimensions (fidelity, robustness, localization, complexity, randomization) across three representative remote sensing datasets. Our analysis reveals that perturbation-based methods are sensitive to baseline selection and spatial structure; gradient-based methods exhibit unstable attributions in multi-label settings; and LRP suffers from uneven attribution distribution. To address these challenges, we propose a domain-specific guidance framework for selecting xAI methods tailored to the multi-label, fine-grained nature of remote sensing classification. We empirically validate that robustness and randomization metrics demonstrate superior reliability, and provide concrete recommendations for key hyperparameter configurations. This work establishes a theoretically grounded and practically actionable paradigm for the rigorous evaluation and deployment of xAI in remote sensing.

Technology Category

Application Category

📝 Abstract
The development of explainable artificial intelligence (xAI) methods for scene classification problems has attracted great attention in remote sensing (RS). Most xAI methods and the related evaluation metrics in RS are initially developed for natural images considered in computer vision (CV), and their direct usage in RS may not be suitable. To address this issue, in this paper, we investigate the effectiveness of explanation methods and metrics in the context of RS image scene classification. In detail, we methodologically and experimentally analyze ten explanation metrics spanning five categories (faithfulness, robustness, localization, complexity, randomization), applied to five established feature attribution methods (Occlusion, LIME, GradCAM, LRP, and DeepLIFT) across three RS datasets. Our methodological analysis identifies key limitations in both explanation methods and metrics. The performance of perturbation-based methods, such as Occlusion and LIME, heavily depends on perturbation baselines and spatial characteristics of RS scenes. Gradient-based approaches like GradCAM struggle when multiple labels are present in the same image, while some relevance propagation methods (LRP) can distribute relevance disproportionately relative to the spatial extent of classes. Analogously, we find limitations in evaluation metrics. Faithfulness metrics share the same problems as perturbation-based methods. Localization metrics and complexity metrics are unreliable for classes with a large spatial extent. In contrast, robustness metrics and randomization metrics consistently exhibit greater stability. Our experimental results support these methodological findings. Based on our analysis, we provide guidelines for selecting explanation methods, metrics, and hyperparameters in the context of RS image scene classification.
Problem

Research questions and friction points this paper is trying to address.

Evaluating xAI method effectiveness in remote sensing scene classification
Identifying limitations of existing xAI metrics for RS image analysis
Providing guidelines for selecting appropriate explanation methods in RS
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes ten explanation metrics in five categories
Evaluates five feature attribution methods on RS datasets
Provides guidelines for method and metric selection
🔎 Similar Papers
No similar papers found.