🤖 AI Summary
This work addresses the high computational cost of state-of-the-art optimal decision tree learning methods when handling continuous features and the poor quality of solutions returned upon interruption due to depth-first search strategies, which often yield highly unbalanced trees. The authors propose a novel anytime-complete algorithm based on limited discrepancy search that guarantees eventual optimality while significantly improving solution quality under arbitrary time budgets by allocating computational resources more evenly. The method innovatively integrates continuous feature splitting optimization with limited discrepancy search, effectively avoiding premature commitment to suboptimal tree structures common in traditional approaches. Experimental results demonstrate that the proposed algorithm consistently outperforms existing state-of-the-art methods across all computational budgets, particularly excelling in resource-constrained settings by producing higher-quality, more balanced decision trees.
📝 Abstract
In recent years, significant progress has been made on algorithms for learning optimal decision trees, primarily in the context of binary features. Extending these methods to continuous features remains substantially more challenging due to the large number of potential splits for each feature. Recently, an elegant exact algorithm was proposed for learning optimal decision trees with continuous features; however, the rapidly increasing computational time limits its practical applicability to shallow depths (typically 3 or 4). It relies on a depth-first search optimization strategy that fully optimizes the left subtree of each split before exploring the corresponding right subtree. While effective in finding optimal solutions given sufficient time, this strategy can lead to poor anytime behavior: when interrupted early, the best-found tree is often highly unbalanced and suboptimal. In such cases, purely greedy methods such as C4.5 may, paradoxically, yield better solutions. To address this limitation, we propose an anytime, yet complete approach leveraging limited discrepancy search, distributing the computational effort more evenly across the entire tree structure, and thus ensuring that a high-quality decision tree is available at any interruption point. Experimental results show that our approach outperforms the existing one in terms of anytime performance.