🤖 AI Summary
Learning high-quality oblique splits in decision trees is an NP-hard problem, and existing approaches rely either on inefficient search procedures or heuristic strategies lacking theoretical guarantees. This work proposes the Hinge Regression Tree model, which formulates oblique splitting as a nonlinear least squares problem under the envelope of the maximum or minimum of two linear predictors. The model achieves ReLU-like expressivity through alternating fitting and, under fixed partitions, is equivalent to a damped Newton method. We theoretically establish that the model is a universal approximator with an explicit approximation rate of O(δ²). Algorithmically, we develop a backtracking line search variant with monotonic convergence guarantees, integrating Gauss–Newton optimization and optional ridge regularization. Experiments demonstrate that the method matches or surpasses existing single-tree baselines on both synthetic and real-world datasets using more compact tree structures, while exhibiting fast and stable convergence.
📝 Abstract
Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries, but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics. We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like expressive power. The resulting alternating fitting procedure is exactly equivalent to a damped Newton (Gauss-Newton) method within fixed partitions. We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges; in practice, both fixed and adaptive damping yield fast, stable convergence and can be combined with optional ridge regularization. We further prove that HRT's model class is a universal approximator with an explicit $O(\delta^2)$ approximation rate, and show on synthetic and real-world benchmarks that it matches or outperforms single-tree baselines with more compact structures.