End-to-end Feature Selection Approach for Learning Skinny Trees

📅 2023-10-28

🏛️ International Conference on Artificial Intelligence and Statistics

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Post-hoc feature importance selection for tree ensembles (e.g., gradient-boosted trees, random forests) often yields redundant and uncontrollable feature subsets. To address this, we propose “Skinny Trees”, an end-to-end trainable framework that jointly optimizes tree structure and sparse feature selection—its first such formulation. Our core contributions are: (1) a differentiable tree ensemble training paradigm incorporating group ℓ₀ regularization to enforce feature-level sparsity; (2) a dense-to-sparse regularization scheduling strategy that dynamically balances model expressivity and feature sparsity; and (3) a first-order optimization algorithm with theoretical convergence guarantees. Evaluated on 15 benchmark datasets, Skinny Trees achieves 1.5×–620× feature compression and up to 10× inference speedup. Under a 25% feature budget, it significantly outperforms LightGBM (+10.2% AUC) and random forests (+3% AUC).

📝 Abstract

We propose a new optimization-based approach for feature selection in tree ensembles, an important problem in statistics and machine learning. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importance scores, while very popular, they are known to have drawbacks. We propose Skinny Trees: an end-to-end toolkit for feature selection in tree ensembles where we train a tree ensemble while controlling the number of selected features. Our optimization-based approach learns an ensemble of differentiable trees, and simultaneously performs feature selection using a grouped $ell_0$-regularizer. We use first-order methods for optimization and present convergence guarantees for our approach. We use a dense-to-sparse regularization scheduling scheme that can lead to more expressive and sparser tree ensembles. On 15 synthetic and real-world datasets, Skinny Trees can achieve $1.5! imes! -~620~! imes!$ feature compression rates, leading up to $10 imes$ faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for 25% feature budget, Skinny Trees outperforms LightGBM by $10.2%$ (up to $37.7%$), and Random Forests by $3%$ (up to $12.5%$).

Problem

Research questions and friction points this paper is trying to address.

Optimizes feature selection in tree ensembles

Trains trees with controlled feature count

Improves speed and performance over existing methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end feature selection for tree ensembles

Grouped L0-regularizer for simultaneous feature selection

Dense-to-sparse regularization scheduling scheme

🔎 Similar Papers

Effective Subset Selection Through The Lens of Neural Network Pruning