๐ค AI Summary
Optimal sparse decision tree learning faces challenges including parameter limitations in depth-first search and prohibitive memory overhead in best-first search. Method: We propose Branches, the first algorithm to formulate decision tree construction as an AND/OR graph search problem and design an AO*-based heuristic search framework. It integrates dynamic programming pruning with exploitation of optimal substructure, ensuring theoretical global optimality while reducing time complexity. Branches natively supports non-binary features, enabling additional computational acceleration. Contribution/Results: Compared to state-of-the-art methods, Branches significantly improves search efficiency for deep trees, reduces memory consumption below that of conventional best-first algorithms, and achieves superior trade-offs among predictive accuracy, computational efficiency, and model interpretability, as empirically validated across benchmark datasets.
๐ Abstract
Decision Tree (DT) Learning is a fundamental problem in Interpretable Machine Learning, yet it poses a formidable optimisation challenge. Practical algorithms have recently emerged, primarily leveraging Dynamic Programming and Branch&Bound. However, most of these approaches rely on a Depth-First-Search strategy, which is inefficient when searching for DTs at high depths and requires the definition of a maximum depth hyperparameter. Best-First-Search was also employed by other methods to circumvent these issues. The downside of this strategy is its higher memory consumption, as such, it has to be designed in a fully efficient manner that takes full advantage of the problem's structure. We formulate the problem as an AND/OR graph search which we solve with a novel AO*-type algorithm called Branches. We prove both optimality and complexity guarantees for Branches and we show that it is more efficient than the state of the art theoretically and on a variety of experiments. Furthermore, Branches supports non-binary features unlike the other methods, we show that this property can further induce larger gains in computational efficiency.