cc-Shapley: Measuring Multivariate Feature Importance Needs Causal Context

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work proposes causal context Shapley (cc-Shapley), a novel approach that integrates the causal structure of the data-generating process into Shapley value computation to address the vulnerability of traditional feature importance measures to spurious associations—such as those induced by collider bias and suppression effects. By incorporating interventional corrections within a causal framework, cc-Shapley evaluates feature contributions in a manner that accounts for underlying causal relationships. The method is theoretically grounded in its ability to eliminate spurious correlations arising from collider bias. Empirical evaluations on both synthetic and real-world datasets demonstrate that cc-Shapley significantly enhances the accuracy and reliability of feature importance explanations compared to conventional Shapley-based methods.

Technology Category

Application Category

📝 Abstract

Explainable artificial intelligence promises to yield insights into relevant features, thereby enabling humans to examine and scrutinize machine learning models or even facilitating scientific discovery. Considering the widespread technique of Shapley values, we find that purely data-driven operationalization of multivariate feature importance is unsuitable for such purposes. Even for simple problems with two features, spurious associations due to collider bias and suppression arise from considering one feature only in the observational context of the other, which can lead to misinterpretations. Causal knowledge about the data-generating process is required to identify and correct such misleading feature attributions. We propose cc-Shapley (causal context Shapley), an interventional modification of conventional observational Shapley values leveraging knowledge of the data's causal structure, thereby analyzing the relevance of a feature in the causal context of the remaining features. We show theoretically that this eradicates spurious association induced by collider bias. We compare the behavior of Shapley and cc-Shapley values on various, synthetic, and real-world datasets. We observe nullification or reversal of associations compared to univariate feature importance when moving from observational to cc-Shapley.

Problem

Research questions and friction points this paper is trying to address.

feature importance

Shapley values

causal context

collider bias

explainable AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

cc-Shapley

causal context

Shapley values