Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Learning high-quality oblique splits in decision trees is an NP-hard problem, and existing approaches rely either on inefficient search procedures or heuristic strategies lacking theoretical guarantees. This work proposes the Hinge Regression Tree model, which formulates oblique splitting as a nonlinear least squares problem under the envelope of the maximum or minimum of two linear predictors. The model achieves ReLU-like expressivity through alternating fitting and, under fixed partitions, is equivalent to a damped Newton method. We theoretically establish that the model is a universal approximator with an explicit approximation rate of O(δ²). Algorithmically, we develop a backtracking line search variant with monotonic convergence guarantees, integrating Gauss–Newton optimization and optional ridge regularization. Experiments demonstrate that the method matches or surpasses existing single-tree baselines on both synthetic and real-world datasets using more compact tree structures, while exhibiting fast and stable convergence.

Technology Category

Application Category

📝 Abstract

Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries, but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics. We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like expressive power. The resulting alternating fitting procedure is exactly equivalent to a damped Newton (Gauss-Newton) method within fixed partitions. We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges; in practice, both fixed and adaptive damping yield fast, stable convergence and can be combined with optional ridge regularization. We further prove that HRT's model class is a universal approximator with an explicit $O(\delta^2)$ approximation rate, and show on synthetic and real-world benchmarks that it matches or outperforms single-tree baselines with more compact structures.

Problem

Research questions and friction points this paper is trying to address.

oblique decision trees

oblique splits

NP-hard

decision tree learning

multivariate decision boundaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Oblique Decision Trees

Newton Method

Non-linear Least Squares