🤖 AI Summary
Existing XAI fidelity evaluation methods suffer from bias due to information leakage induced by out-of-distribution (OOD) perturbations and model retraining. To address this, we propose the first information-leakage-free and OOD-robust evaluation framework. Our method eliminates reliance on explanation models via explanation-agnostic fine-tuning, decouples perturbation effects from true explanation structure through randomized input masking and controlled degradation experiments, and introduces the theoretically grounded F-Fidelity metric—first enabling backward inference of the true explanation’s sparsity from evaluation outcomes. Validated across multimodal benchmarks (image, time-series, and text), our framework significantly improves explanation model ranking recovery accuracy. Both theoretical analysis and empirical results consistently demonstrate that F-Fidelity precisely estimates the ground-truth explanation size.
📝 Abstract
Recent research has developed a number of eXplainable AI (XAI) techniques, such as gradient-based approaches, input perturbation-base methods, and black-box explanation methods. While these XAI techniques can extract meaningful insights from deep learning models, how to properly evaluate them remains an open problem. The most widely used approach is to perturb or even remove what the XAI method considers to be the most important features in an input and observe the changes in the output prediction. This approach, although straightforward, suffers the Out-of-Distribution (OOD) problem as the perturbed samples may no longer follow the original data distribution. A recent method RemOve And Retrain (ROAR) solves the OOD issue by retraining the model with perturbed samples guided by explanations. However, using the model retrained based on XAI methods to evaluate these explainers may cause information leakage and thus lead to unfair comparisons. We propose Fine-tuned Fidelity (F-Fidelity), a robust evaluation framework for XAI, which utilizes i) an explanation-agnostic fine-tuning strategy, thus mitigating the information leakage issue, and ii) a random masking operation that ensures that the removal step does not generate an OOD input. We also design controlled experiments with state-of-the-art (SOTA) explainers and their degraded version to verify the correctness of our framework. We conduct experiments on multiple data modalities, such as images, time series, and natural language. The results demonstrate that F-Fidelity significantly improves upon prior evaluation metrics in recovering the ground-truth ranking of the explainers. Furthermore, we show both theoretically and empirically that, given a faithful explainer, F-Fidelity metric can be used to compute the sparsity of influential input components, i.e., to extract the true explanation size.