What Functions Does XGBoost Learn?

📅 2026-01-09

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work aims to bridge the gap between the empirical success and theoretical understanding of XGBoost by characterizing the implicit function class it learns during training. By constructing an infinite-dimensional function space $\mathcal{F}^{d, s}_{\infty\text{-ST}}$ and introducing a corresponding complexity measure $V^{d, s}_{\infty\text{-XGB}}(\cdot)$, we rigorously reformulate the XGBoost objective as a regularized regression problem over this space, thereby providing the first precise characterization of its learned function class. We further establish a smoothness connection between this class and the Hardy–Krause variation. Building on this foundation, we prove that the least squares estimator over this class achieves a convergence rate of $n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}$, which is nearly minimax optimal and effectively mitigates the curse of dimensionality.

Technology Category

Application Category

📝 Abstract

This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ that extends finite ensembles of bounded-depth regression trees, together with a complexity measure $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ that generalizes the $L^1$ regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ with penalty $V^{d, s}_{\infty-\text{XGB}}(\cdot)$, providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ and $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ in terms of Hardy--Krause variation. We prove that the least squares estimator over $\{f \in \mathcal{F}^{d, s}_{\infty-\text{ST}}: V^{d, s}_{\infty-\text{XGB}}(f) \le V\}$ achieves a nearly minimax-optimal rate of convergence $n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}$, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.

Problem

Research questions and friction points this paper is trying to address.

XGBoost

function class

theoretical foundation

Hardy–Krause variation

minimax optimality

Innovation

Methods, ideas, or system contributions that make the work stand out.

XGBoost

function class

Hardy–Krause variation