🤖 AI Summary
This paper addresses the lack of a unified evaluation framework for variable importance measures (VIMs). We first establish an axiomatic framework for VIM grounded in information gain axioms, formally specifying essential properties—including reasonableness, consistency, and confounding robustness—that any valid VIM must satisfy. Building upon this foundation, we propose a general, principled pipeline for constructing and evaluating VIMs, integrating information theory, statistical inference, and variable selection theory. Our framework systematically characterizes and compares the behavior of prominent VIMs—including Permutation Importance, SHAP, and Gini Importance—under the proposed axiomatic constraints. Crucially, it bridges the theoretical gap between VIM evaluation and variable selection. The results provide both a rigorous theoretical foundation and practical guidelines for the scientific selection, formal validation, and reliable deployment of VIMs in interpretable AI systems.
📝 Abstract
Variable importance measures (VIMs) aim to quantify the contribution of each input covariate to the predictability of a given output. With the growing interest in explainable AI, numerous VIMs have been proposed, many of which are heuristic in nature. This is often justified by the inherent subjectivity of the notion of importance. This raises important questions regarding usage: What makes a good VIM? How can we compare different VIMs?
In this paper, we address these questions by: (1) proposing an axiomatic framework that bridges the gap between variable importance and variable selection. This framework formalizes the intuitive principle that features providing no additional information should not be assigned importance. It helps avoid false positives due to spurious correlations, which can arise with popular methods such as Shapley values; and (2) introducing a general pipeline for constructing VIMs, which clarifies the objective of various VIMs and thus facilitates meaningful comparisons. This approach is natural in statistics, but the literature has diverged from it.
Finally, we provide an extensive set of examples to guide practitioners in selecting and estimating appropriate indices aligned with their specific goals and data.