Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Learning high-quality oblique splits in decision trees is an NP-hard problem, and existing approaches rely either on inefficient search procedures or heuristic strategies lacking theoretical guarantees. This work proposes the Hinge Regression Tree model, which formulates oblique splitting as a nonlinear least squares problem under the envelope of the maximum or minimum of two linear predictors. The model achieves ReLU-like expressivity through alternating fitting and, under fixed partitions, is equivalent to a damped Newton method. We theoretically establish that the model is a universal approximator with an explicit approximation rate of O(δ²). Algorithmically, we develop a backtracking line search variant with monotonic convergence guarantees, integrating Gauss–Newton optimization and optional ridge regularization. Experiments demonstrate that the method matches or surpasses existing single-tree baselines on both synthetic and real-world datasets using more compact tree structures, while exhibiting fast and stable convergence.

Technology Category

Application Category

📝 Abstract
Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries, but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics. We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like expressive power. The resulting alternating fitting procedure is exactly equivalent to a damped Newton (Gauss-Newton) method within fixed partitions. We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges; in practice, both fixed and adaptive damping yield fast, stable convergence and can be combined with optional ridge regularization. We further prove that HRT's model class is a universal approximator with an explicit $O(\delta^2)$ approximation rate, and show on synthetic and real-world benchmarks that it matches or outperforms single-tree baselines with more compact structures.
Problem

Research questions and friction points this paper is trying to address.

oblique decision trees
oblique splits
NP-hard
decision tree learning
multivariate decision boundaries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Oblique Decision Trees
Newton Method
Non-linear Least Squares
Universal Approximation
Hinge Regression Tree
🔎 Similar Papers
No similar papers found.
H
Hongyi Li
School of Intelligence Science and Engineering, Harbin Institute of Technology, Shenzhen
H
Han Lin
School of Intelligence Science and Engineering, Harbin Institute of Technology, Shenzhen
Jun Xu
Jun Xu
哈尔滨工业大学(深圳)
piecewise linear neural networksystem identificationmodel predictive control