A result relating convex n-widths to covering numbers with some applications to neural networks

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the curse of dimensionality in high-dimensional function approximation—specifically, why certain high-dimensional functions admit accurate linear approximation using only a small number of features. Method: We introduce the notion of a “convex kernel” and establish a quantitative relationship between the convex $n$-width of a function class and its metric entropy (covering number), characterizing optimal linear approximation error via the covering number of the convex kernel. Our analysis integrates convex analysis, metric entropy theory, and approximation theory to systematically examine how the geometric structure of the hidden-node function class in single-hidden-layer neural networks governs overall approximation capacity. Contribution/Results: We derive tight upper bounds on approximation rates: when the covering number of the hidden-node function class grows slowly, the approximation error decays polynomially—or even exponentially—with respect to the number of features. This framework provides a novel theoretical foundation for understanding the efficacy of neural networks and related models in high-dimensional settings.

Technology Category

Application Category

📝 Abstract
In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or ``features'' is known to be hard. Typically, the worst-case error of the best basis set decays only as fast as $Θ(n^{-1/d})$, where $n$ is the number of basis functions and $d$ is the input dimension. However, there are many examples of high-dimensional pattern recognition problems (such as face recognition) where linear combinations of small sets of features do solve the problem well. Hence these function classes do not suffer from the ``curse of dimensionality'' associated with more general classes. It is natural then, to look for characterizations of high-dimensional function classes that nevertheless are approximated well by linear combinations of small sets of features. In this paper we give a general result relating the error of approximation of a function class to the covering number of its ``convex core''. For one-hidden-layer neural networks, covering numbers of the class of functions computed by a single hidden node upper bound the covering numbers of the convex core. Hence, using standard results we obtain upper bounds on the approximation rate of neural network classes.
Problem

Research questions and friction points this paper is trying to address.

Relates convex n-widths to covering numbers for function approximation
Addresses curse of dimensionality in high-dimensional pattern recognition
Bounds approximation rates for one-hidden-layer neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Relates convex n-widths to covering numbers
Uses covering numbers of convex core
Applies to one-hidden-layer neural networks
🔎 Similar Papers
No similar papers found.
J
Jonathan Baxter
Department of Systems Engineering, Australian National University, Canberra 0200, Australia
Peter Bartlett
Peter Bartlett
Professor, EECS and Statistics, UC Berkeley
machine learningstatistical learning theoryadaptive control