Fast Estimation of Partial Dependence Functions using Trees

📅 2024-10-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing partial dependence (PD) estimation methods—such as TreeSHAP—exhibit inconsistency and computational inefficiency in feature-correlated settings. To address this, we propose FastPD, the first consistent and path-independent PD estimator for tree-based models. FastPD introduces an exact integration approximation algorithm grounded in decision tree structure, combining piecewise conditional expectation computation with optimized tree traversal to reduce PD estimation complexity from *O(n²)* to *O(n)* for trees of moderate depth. We provide a theoretical proof of its strong consistency. Empirical evaluations demonstrate that FastPD significantly outperforms state-of-the-art baselines in accuracy and efficiency across PD curve estimation, SHAP value computation, and higher-order interaction effect quantification. By delivering both statistical reliability and linear-time scalability, FastPD establishes a robust, efficient foundation for model interpretability in practical machine learning applications.

Technology Category

Application Category

📝 Abstract
Many existing interpretation methods are based on Partial Dependence (PD) functions that, for a pre-trained machine learning model, capture how a subset of the features affects the predictions by averaging over the remaining features. Notable methods include Shapley additive explanations (SHAP) which computes feature contributions based on a game theoretical interpretation and PD plots (i.e., 1-dim PD functions) that capture average marginal main effects. Recent work has connected these approaches using a functional decomposition and argues that SHAP values can be misleading since they merge main and interaction effects into a single local effect. A major advantage of SHAP compared to other PD-based interpretations, however, has been the availability of fast estimation techniques, such as exttt{TreeSHAP}. In this paper, we propose a new tree-based estimator, exttt{FastPD}, which efficiently estimates arbitrary PD functions. We show that exttt{FastPD} consistently estimates the desired population quantity -- in contrast to path-dependent exttt{TreeSHAP} which is inconsistent when features are correlated. For moderately deep trees, exttt{FastPD} improves the complexity of existing methods from quadratic to linear in the number of observations. By estimating PD functions for arbitrary feature subsets, exttt{FastPD} can be used to extract PD-based interpretations such as SHAP, PD plots and higher order interaction effects.
Problem

Research questions and friction points this paper is trying to address.

Efficiently estimates Partial Dependence (PD) functions for machine learning models
Addresses inconsistency in TreeSHAP with correlated features
Reduces computational complexity from quadratic to linear for PD estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

FastPD estimates Partial Dependence functions efficiently
FastPD improves complexity from quadratic to linear
FastPD handles correlated features consistently
J
Jinyang Liu
University of Copenhagen, Denmark
T
Tessa Steensgaard
University of Copenhagen, Denmark
Marvin N. Wright
Marvin N. Wright
Leibniz Institute for Prevention Research and Epidemiology – BIPS & University of Bremen
interpretable machine learningbiostatistics
Niklas Pfister
Niklas Pfister
Associate Professor, University of Copenhagen
M
M. Hiabu
University of Copenhagen, Denmark