🤖 AI Summary
To address the inherent trade-off between high estimation variance in decision trees and the lack of interpretability in random forests, this paper proposes a novel, interpretable embedding method based on leaf-node means. Specifically, it constructs a continuous embedding space using the leaf partitions of classification trees, mapping each input to the mean feature vector of its corresponding leaf region—thereby substantially reducing estimation variance. The method further integrates bootstrap aggregation with linear discriminant analysis (LDA) to jointly optimize predictive accuracy, computational efficiency, and model transparency. Theoretically grounded and empirically validated, the approach achieves classification accuracy on par with or surpassing that of random forests and shallow neural networks across multiple synthetic and real-world benchmark datasets. Moreover, it offers significantly faster training, strong generalization, low computational overhead, and explicit, semantically meaningful decision rules—effectively reconciling fidelity, efficiency, and interpretability in tree-based learning.
📝 Abstract
Decision trees and random forest remain highly competitive for classification on medium-sized, standard datasets due to their robustness, minimal preprocessing requirements, and interpretability. However, a single tree suffers from high estimation variance, while large ensembles reduce this variance at the cost of substantial computational overhead and diminished interpretability. In this paper, we propose Decision Tree Embedding (DTE), a fast and effective method that leverages the leaf partitions of a trained classification tree to construct an interpretable feature representation. By using the sample means within each leaf region as anchor points, DTE maps inputs into an embedding space defined by the tree's partition structure, effectively circumventing the high variance inherent in decision-tree splitting rules. We further introduce an ensemble extension based on additional bootstrap trees, and pair the resulting embedding with linear discriminant analysis for classification. We establish several population-level theoretical properties of DTE, including its preservation of conditional density under mild conditions and a characterization of the resulting classification error. Empirical studies on synthetic and real datasets demonstrate that DTE strikes a strong balance between accuracy and computational efficiency, outperforming or matching random forest and shallow neural networks while requiring only a fraction of their training time in most cases. Overall, the proposed DTE method can be viewed either as a scalable decision tree classifier that improves upon standard split rules, or as a neural network model whose weights are learned from tree-derived anchor points, achieving an intriguing integration of both paradigms.