🤖 AI Summary
Tree models suffer from insufficient local interpretability in high-stakes applications; existing perturbation-based methods (e.g., LIME, TreeSHAP) disregard model structure and exhibit poor stability, while global approaches (e.g., MDI+) fail to capture individual heterogeneity.
Method: We propose LMDI+, the first sample-level feature importance method grounded in node-wise linear equivalence transformations of trees. It rigorously extends the MDI+ framework to the local setting, performing instance-level attribution solely based on internal tree structure. Leveraging theoretical equivalence between decision trees and piecewise linear models over node subspaces, LMDI+ integrates random forest structural analysis with a local weighting mechanism.
Contribution/Results: LMDI+ enables counterfactual generation and homogeneous subgroup discovery. Evaluated on 12 real-world datasets, it improves downstream task performance by an average of 10% and achieves significantly higher feature ranking stability than LIME and TreeSHAP.
📝 Abstract
Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific) feature importance (LFI) methods such as LIME and TreeSHAP. However, these approaches rely on approximations that ignore the model's internal structure and instead depend on potentially unstable perturbations. These issues are addressed in the global setting by MDI+, a feature importance method which exploits an equivalence between decision trees and linear models on a transformed node basis. However, the global MDI+ scores are not able to explain predictions when faced with heterogeneous individual characteristics. To address this gap, we propose Local MDI+ (LMDI+), a novel extension of the MDI+ framework to the sample specific setting. LMDI+ outperforms existing baselines LIME and TreeSHAP in identifying instance-specific signal features, averaging a 10% improvement in downstream task performance across twelve real-world benchmark datasets. It further demonstrates greater stability by consistently producing similar instance-level feature importance rankings across multiple random forest fits. Finally, LMDI+ enables local interpretability use cases, including the identification of closer counterfactuals and the discovery of homogeneous subgroups.