Disentangled Feature Importance

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard feature importance methods systematically underestimate importance when features are dependent. Method: This paper introduces the Decoupled Feature Importance (DFI) framework—the first to leverage optimal transport for feature importance disentanglement—by nonparametrically mapping original features to an independent latent space via the Bures–Wasserstein transport map, enabling additive decomposition of total prediction variance while preserving predictive fidelity. Contribution/Results: DFI enjoys root-n consistency and asymptotic normality under mild conditions; it avoids modeling conditional distributions or refitting submodels, ensuring both statistical robustness and computational efficiency. Empirical results demonstrate that DFI achieves second-order error decay across arbitrary dependence structures, substantially improving estimation accuracy, stability, and scalability compared to existing approaches.

Technology Category

Application Category

📝 Abstract
Feature importance quantification faces a fundamental challenge: when predictors are correlated, standard methods systematically underestimate their contributions. We prove that major existing approaches target identical population functionals under squared-error loss, revealing why they share this correlation-induced bias. To address this limitation, we introduce emph{Disentangled Feature Importance (DFI)}, a nonparametric generalization of the classical $R^2$ decomposition via optimal transport. DFI transforms correlated features into independent latent variables using a transport map, eliminating correlation distortion. Importance is computed in this disentangled space and attributed back through the transport map's sensitivity. DFI provides a principled decomposition of importance scores that sum to the total predictive variability for latent additive models and to interaction-weighted functional ANOVA variances more generally, under arbitrary feature dependencies. We develop a comprehensive semiparametric theory for DFI. For general transport maps, we establish root-$n$ consistency and asymptotic normality of importance estimators in the latent space, which extends to the original feature space for the Bures-Wasserstein map. Notably, our estimators achieve second-order estimation error, which vanishes if both regression function and transport map estimation errors are $o_{mathbb{P}}(n^{-1/4})$. By design, DFI avoids the computational burden of repeated submodel refitting and the challenges of conditional covariate distribution estimation, thereby achieving computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Quantify feature importance with correlated predictors
Eliminate correlation-induced bias in importance estimation
Provide computationally efficient nonparametric importance decomposition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled Feature Importance via optimal transport
Transforms features into independent latent variables
Root-n consistent asymptotic normality estimators
🔎 Similar Papers
No similar papers found.