🤖 AI Summary
Existing weak abductive explanation (WAXp) methods compute feature importance solely from the WAXp set, neglecting critical information embedded in the complementary (non-WAXp) set—particularly its relationships with formal explanations (XPs) and adversarial examples (AExs)—leading to incomplete attribution. This work is the first to systematically incorporate the non-WAXp set into feature importance modeling. We propose a novel game-theoretic scoring framework that unifies Shapley values and Banzhaf indices, and establish a theoretical link between XPs and AExs to quantify each feature’s robustness contribution toward excluding adversarial perturbations. The resulting method ensures both theoretical rigor and computational interpretability. Empirically, it significantly enhances the completeness and security of feature attribution in high-stakes settings, providing a more reliable foundational metric for explainable AI. (136 words)
📝 Abstract
Feature attribution methods based on game theory are ubiquitous in the field of eXplainable Artificial Intelligence (XAI). Recent works proposed rigorous feature attribution using logic-based explanations, specifically targeting high-stakes uses of machine learning (ML) models. Typically, such works exploit weak abductive explanation (WAXp) as the characteristic function to assign importance to features. However, one possible downside is that the contribution of non-WAXp sets is neglected. In fact, non-WAXp sets can also convey important information, because of the relationship between formal explanations (XPs) and adversarial examples (AExs). Accordingly, this paper leverages Shapley value and Banzhaf index to devise two novel feature importance scores. We take into account non-WAXp sets when computing feature contribution, and the novel scores quantify how effective each feature is at excluding AExs. Furthermore, the paper identifies properties and studies the computational complexity of the proposed scores.