xplainfi: Feature Importance and Statistical Inference for Machine Learning in R

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the lack of a unified R framework for computing conditional feature importance and conducting associated statistical inference, which hinders reliable interpretation of machine learning models. To bridge this gap, the authors introduce xplainfi, an R package built on the mlr3 ecosystem that features a novel modular conditional sampling architecture. It integrates diverse samplers—including Gaussian approximations, adversarial random forests, conditional inference trees, and knockoffs—making it suitable for both continuous and mixed-type data. The package supports multiple global importance measures such as permutation importance, conditional and marginal Shapley values, and leave-one-covariate-out methods. Rigorous statistical inference is enabled through variance-corrected confidence intervals and a conditional predictive impact framework. Empirical evaluations demonstrate that xplainfi yields importance scores consistent with existing approaches while maintaining competitive computational efficiency. The package is publicly available on CRAN.

Technology Category

Application Category

📝 Abstract

We introduce xplainfi, an R package built on top of the mlr3 ecosystem for global, loss-based feature importance methods for machine learning models. Various feature importance methods exist in R, but significant gaps remain, particularly regarding conditional importance methods and associated statistical inference procedures. The package implements permutation feature importance, conditional feature importance, relative feature importance, leave-one-covariate-out, and generalizations thereof, and both marginal and conditional Shapley additive global importance methods. It provides a modular conditional sampling architecture based on Gaussian distributions, adversarial random forests, conditional inference trees, and knockoff-based samplers, which enable conditional importance analysis for continuous and mixed data. Statistical inference is available through multiple approaches, including variance-corrected confidence intervals and the conditional predictive impact framework. We demonstrate that xplainfi produces importance scores consistent with existing implementations across multiple simulation settings and learner types, while offering competitive runtime performance. The package is available on CRAN and provides researchers and practitioners with a comprehensive toolkit for feature importance analysis and model interpretation in R.

Problem

Research questions and friction points this paper is trying to address.

feature importance

conditional importance

statistical inference

model interpretation

machine learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional feature importance

statistical inference

Shapley additive importance