When Does $\ell_2$-Boosting Overfit Benignly? High-Dimensional Risk Asymptotics and the $\ell_1$ Implicit Bias

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work investigates whether $\ell_2$-Boosting with implicit $\ell_1$ bias can achieve benign overfitting in high-dimensional settings, both in pure noise and signal-plus-noise scenarios. By leveraging a continuous-time Boosting model and high-dimensional asymptotic analysis—combined with the Convex Gaussian Min-Max Theorem and two-sided truncated Gaussian moment expansions—the paper provides the first analytical characterization of the risk behavior of nonsmooth $\ell_1$-interpolating solutions. The theory reveals that the $\ell_1$ bias disrupts benign overfitting: under isotropic noise, the excess variance decays only as $\Theta(\sigma^2 / \log(p/n))$, and under spiked-isotropic designs, the risk converges to zero merely at a logarithmic rate. To address this, the authors propose a tuning-free early stopping rule that recovers the minimax optimal prediction rate.

📝 Abstract

Benign overfitting is well-characterized in $\ell_2$ geometries, but its behavior under the $\ell_1$ implicit bias of greedy ensembles remains challenging. The analytical barrier stems from the non-linear coupling of coordinate selection thresholds, which invalidates standard spectral resolvent tools. To isolate this algorithmic bias, we characterize the high-dimensional risk of continuous-time $\ell_2$-Boosting over $p$ features and $n$ samples. By coupling the Convex Gaussian Minimax Theorem with delicate asymptotic expansions of double-sided truncated Gaussian moments, we analytically resolve the non-smooth $\ell_1$ interpolant. Under an isotropic pure-noise model, we prove that benign overfitting fails at the linear rate: greedy selection localizes noise into sparse active sets, and the excess variance decays at a logarithmic rate $Θ(σ^2/\log(p/n))$ for noise variance $σ^2$. We remark that while this localization mechanism should persist in the presence of signals, the exact signal-noise decomposition remains an open problem. For spiked-isotropic designs with $k^*$ head eigenvalues and $r_2 = p - k^*$ tail dimensions, the risk converges to zero when $r_{2} \gg n$, but only at a logarithmic rate $Θ(σ^2/\log(r_2/n))$, which is slower than the linear decay observed in $\ell_2$ geometries. To avoid this slow convergence, we analyze the non-smooth subdifferential dynamics of the boosting flow. This yields a tuning-free early stopping rule that, under a bounded $\ell_1$-path condition, recovers the Lasso basic inequality and attains the minimax-optimal empirical prediction rate for $\ell_1$-bounded signals.

Problem

Research questions and friction points this paper is trying to address.

benign overfitting

ℓ₁ implicit bias

ℓ₂-Boosting

high-dimensional risk

greedy ensembles

Innovation

Methods, ideas, or system contributions that make the work stand out.

benign overfitting

ℓ₁ implicit bias

high-dimensional asymptotics