Gradient Regularized Newton Boosting Trees with Global Convergence

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the lack of global convergence in Newton boosting trees under general convex losses, which can lead to divergence. The authors propose a gradient-regularized Newton boosting framework that introduces an adaptive ℓ² regularization term at each iteration, proportional to the square root of the gradient norm. By integrating constrained Newton descent with Hessian-Lipschitz analysis and leveraging standard weak learner assumptions, they establish the first global convergence guarantee for Newton boosting trees. The resulting algorithm achieves a convergence rate of 𝒪(1/k²) under general convex losses—matching the rate of first-order methods equipped with Nesterov momentum. Empirical results confirm the stable convergence of the proposed method, whereas classical Newton boosting may diverge.

📝 Abstract

Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based on Newton boosting: a second-order descent step in the space of decision trees. Despite its empirical success, the global convergence of Newton boosting is poorly understood compared to first-order boosting. In this paper, we introduce Restricted Newton Descent, which studies convex optimization with Newton's method on Hilbert spaces with inexact iterates, based on the concepts of cosine angle and weak gradient edge. Within this framework, we recover Newton boosting with GBDTs and classical finite-dimensional theory as special cases. We first prove that vanilla Newton boosting achieves a linear rate of convergence for smooth, strongly convex losses that satisfy a Hessian-dominance condition. To handle general convex losses with Lipschitz Hessians, we extend a recent gradient regularized Newton scheme to the restricted weak learner setting. This scheme minimally modifies the classical algorithm by introducing an adaptive $\ell_2$-regularization term proportional to the square root of the gradient norm at each iteration. We establish a $\mathcal{O}(\frac{1}{k^2})$ rate for this scheme, thereby obtaining a globally convergent second-order GBDT algorithm with a rate matching that of first-order boosting with Nesterov momentum. In numerical experiments, we show that our scheme converges while vanilla Newton boosting may diverge.

Problem

Research questions and friction points this paper is trying to address.

Newton boosting

global convergence

gradient regularization

GBDT

convex optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Newton Boosting

Global Convergence

Gradient Regularization