🤖 AI Summary
Traditional differentiable decision trees for regression tasks struggle to jointly optimize internal and leaf nodes due to reliance on approximation strategies such as boundary smoothing or gradient quantization, often leading to overfitting. This work proposes DTSemNet, a novel framework that exactly represents hard oblique decision trees through a semantically equivalent and invertible neural architecture, enabling end-to-end gradient-based training without approximations. To further enhance gradient precision for regression, the method introduces an annealed Top-k mechanism that provides accurate routing signals during training. DTSemNet is the first approach to achieve approximation-free differentiable training of oblique decision trees, outperforming existing methods on both classification and regression benchmarks. Moreover, it demonstrates practical utility in reinforcement learning by serving as an interpretable, programmatic policy.
📝 Abstract
Decision Trees (DTs) are widely used in safety-critical domains such as medical diagnosis, valued for their interpretability and effectiveness on tabular data. However, training accurate oblique DTs is challenging due to complex optimization landscapes and overfitting risks, particularly in regression. Recent advances have introduced differentiable formulations that enable gradient-based training and joint optimization of decision boundaries and leaf regressors. Yet, existing approaches typically rely on approximations, either through probabilistic softening of boundaries (soft DTs) or quantized gradients such as the Straight-Through Estimator (STE). To overcome these limitations, we propose DTSemNet, a novel, semantically equivalent, and invertible representation of hard oblique DTs as neural networks. DTSemNet enables end-to-end training with standard gradient descent, eliminating the need for approximations in both classification and regression. While classification aligns naturally with this formulation, regression remains challenging due to the joint optimization of internal nodes and leaf regressors. To address this, we analyze the limitations of STE and introduce an annealed Top-k method that provides accurate gradient signals without approximation. Extensive experiments on classification and regression benchmarks show that DTSemNet-trained oblique DTs outperform state-of-the-art differentiable DTs. Furthermore, we demonstrate that DTSemNet can serve as programmatic DT policies in reinforcement learning environments, thereby broadening their applicability.