🤖 AI Summary
Existing feature importance methods struggle to characterize high-order feature interactions, while mainstream interaction measures either suffer from narrow applicability or prohibitive computational cost, and lack distribution-free statistical inference support. This paper proposes iLOCO—a model-agnostic framework for quantifying high-order feature interaction importance. Its key contributions are: (1) the first interaction importance definition grounded in the Leave-One-Covariate-Out (LOCO) paradigm; (2) the first distribution-free, assumption-light nonparametric method for constructing confidence intervals; and (3) an efficient integrated computational architecture enabling interpretable assessment and significance testing of interactions of arbitrary order. Extensive experiments on synthetic and real-world datasets demonstrate that iLOCO substantially outperforms existing approaches, achieving, for the first time, reliable, verifiable statistical inference for interaction effects—while maintaining both computational efficiency and statistical rigor.
📝 Abstract
Feature importance measures are widely studied and are essential for understanding model behavior, guiding feature selection, and enhancing interpretability. However, many machine learning fitted models involve complex, higher-order interactions between features. Existing feature importance metrics fail to capture these higher-order effects while existing interaction metrics often suffer from limited applicability or excessive computation; no methods exist to conduct statistical inference for feature interactions. To bridge this gap, we first propose a new model-agnostic metric, interaction Leave-One-Covariate-Out iLOCO, for measuring the importance of higher-order feature interactions. Next, we leverage recent advances in LOCO inference to develop distribution-free and assumption-light confidence intervals for our iLOCO metric. To address computational challenges, we also introduce an ensemble learning method for calculating the iLOCO metric and confidence intervals that we show is both computationally and statistically efficient. We validate our iLOCO metric and our confidence intervals on both synthetic and real data sets, showing that our approach outperforms existing methods and provides the first inferential approach to detecting feature interactions.