TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the unreliability of conventional feature attribution methods—such as those based on Shapley or Banzhaf values—when jointly optimizing insertion and deletion metrics, a limitation rooted in the linearity axiom. To overcome this, we propose TreeGrad, the first feature scoring approach that satisfies all probabilistic value axioms except linearity. By leveraging the multilinear extension gradients of decision trees, TreeGrad computes attributions efficiently in O(L) time. Building on this foundation, we introduce the TreeGrad-Ranker framework along with concrete instantiations like TreeGrad-Shap and TreeProb, enabling numerically stable computation of Beta Shapley values. Empirical evaluations demonstrate that TreeGrad-Ranker significantly outperforms existing methods on both insertion and deletion benchmarks, achieving up to a 10¹⁵-fold reduction in Shapley value estimation error compared to Linear TreeShap.

Technology Category

Application Category

📝 Abstract

We revisit the use of probabilistic values, which include the well-known Shapley and Banzhaf values, to rank features for explaining the local predicted values of decision trees. The quality of feature rankings is typically assessed with the insertion and deletion metrics. Empirically, we observe that co-optimizing these two metrics is closely related to a joint optimization that selects a subset of features to maximize the local predicted value while minimizing it for the complement. However, we theoretically show that probabilistic values are generally unreliable for solving this joint optimization. Therefore, we explore deriving feature rankings by directly optimizing the joint objective. As the backbone, we propose TreeGrad, which computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves; these gradients include weighted Banzhaf values. Building upon TreeGrad, we introduce TreeGrad-Ranker, which aggregates the gradients while optimizing the joint objective to produce feature rankings, and TreeGrad-Shap, a numerically stable algorithm for computing Beta Shapley values with integral parameters. In particular, the feature scores computed by TreeGrad-Ranker satisfy all the axioms uniquely characterizing probabilistic values, except for linearity, which itself leads to the established unreliability. Empirically, we demonstrate that the numerical error of Linear TreeShap can be up to $10^{15}$ times larger than that of TreeGrad-Shap when computing the Shapley value. As a by-product, we also develop TreeProb, which generalizes Linear TreeShap to support all probabilistic values. In our experiments, TreeGrad-Ranker performs significantly better on both insertion and deletion metrics. Our code is available at https://github.com/watml/TreeGrad.

Problem

Research questions and friction points this paper is trying to address.

feature ranking

decision trees

probabilistic values

Shapley values

local explanation

Innovation

Methods, ideas, or system contributions that make the work stand out.

TreeGrad

feature ranking

probabilistic values