Online Realizable Regression and Applications for ReLU Networks

📅 2026-02-22

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work investigates the learnability of achieving finite cumulative regret in adversarial online realizable regression, particularly without margin or stochasticity assumptions. By introducing an entropy potential method, it transforms upper bounds on online dimension into Dudley-type entropy integrals that depend solely on the covering numbers of the hypothesis class, thereby revealing a fundamental distinction between regression and classification in terms of learnability. The main contributions include proving that polynomial metric entropy suffices to guarantee finite cumulative regret without any horizon dependence, establishing a sharp q–d phase transition threshold in Lipschitz regression, and demonstrating that bounded-norm k-ReLU networks can achieve finite regret in regression—e.g., O(1) for a single ReLU—while under the same conditions, classification becomes impossible.

Technology Category

Application Category

📝 Abstract

Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al. shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension $\mathbb{D}_{\mathrm{onl}}$, but this quantity can be difficult to analyze. Our main contribution is a generic potential method that upper bounds $\mathbb{D}_{\mathrm{onl}}$ by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an \emph{entropy potential} $Φ(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$, where $N(\mathcal{H},\varepsilon)$ is the $\varepsilon$-covering number of $\mathcal{H}$, and show that for every $c$-approximate pseudo-metric loss, $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,Φ(\mathcal{H})$. In particular, polynomial metric entropy implies $Φ(\mathcal{H})<\infty$ and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. We prove a sharp $q$-vs.-$d$ dichotomy for realizable online learning (finite and efficiently achievable $Θ_{d,q}(L^d)$ total loss for $L$-Lipschitz regression iff $q>d$, otherwise infinite), and for bounded-norm $k$-ReLU networks separate regression (finite loss, even $\widetilde O(k^2)$, and $O(1)$ for one ReLU) from classification (impossible already for $k=2,d=1$).

Problem

Research questions and friction points this paper is trying to address.

online regression

realizability

adversarial model

cumulative loss

covering numbers

Innovation

Methods, ideas, or system contributions that make the work stand out.

realizable online regression

entropy potential

Dudley-type entropy integral