A duality framework for analyzing random feature and two-layer neural networks

📅 2023-05-09
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
This paper investigates function approximation and estimation for random feature models (RFMs) and two-layer neural networks in the $mathcal{F}_{p,pi}$ and Barron spaces. We introduce *information complexity* (I-complexity), a novel complexity measure, and develop a unified theoretical framework grounded in duality analysis. Our main contributions are: (1) establishing, for the first time, a dual equivalence between approximation and estimation errors; (2) overcoming limitations of kernel methods by deriving dimension-free, tight learning bounds for RFMs when $p > 1$; (3) achieving spectral-dependent error characterization for RKHS functions under the $L^infty$ norm and proving near-optimality of kernel ridge regression in this norm; and (4) demonstrating that I-complexity yields tight upper bounds in the noiseless setting and matches the Le Cam lower bound in the noisy setting.
📝 Abstract
We consider the problem of learning functions within the $mathcal{F}_{p,pi}$ and Barron spaces, which play crucial roles in understanding random feature models (RFMs), two-layer neural networks, as well as kernel methods. Leveraging tools from information-based complexity (IBC), we establish a dual equivalence between approximation and estimation, and then apply it to study the learning of the preceding function spaces. The duality allows us to focus on the more tractable problem between approximation and estimation. To showcase the efficacy of our duality framework, we delve into two important but under-explored problems: 1) Random feature learning beyond kernel regime: We derive sharp bounds for learning $mathcal{F}_{p,pi}$ using RFMs. Notably, the learning is efficient without the curse of dimensionality for $p>1$. This underscores the extended applicability of RFMs beyond the traditional kernel regime, since $mathcal{F}_{p,pi}$ with $p<2$ is strictly larger than the corresponding reproducing kernel Hilbert space (RKHS) where $p=2$. 2) The $L^infty$ learning of RKHS: We establish sharp, spectrum-dependent characterizations for the convergence of $L^infty$ learning error in both noiseless and noisy settings. Surprisingly, we show that popular kernel ridge regression can achieve near-optimal performance in $L^infty$ learning, despite it primarily minimizing square loss. To establish the aforementioned duality, we introduce a type of IBC, termed $I$-complexity, to measure the size of a function class. Notably, $I$-complexity offers a tight characterization of learning in noiseless settings, yields lower bounds comparable to Le Cam's in noisy settings, and is versatile in deriving upper bounds. We believe that our duality framework holds potential for broad application in learning analysis across more scenarios.
Problem

Research questions and friction points this paper is trying to address.

Analyzing learning in random feature models
Studying two-layer neural networks function spaces
Establishing duality between approximation and estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Duality framework for neural networks
Sharp bounds for random feature learning
I-complexity for function class measurement
🔎 Similar Papers
No similar papers found.
H
Hongrui Chen
Department of Mathematics, Stanford University
Jihao Long
Jihao Long
Researcher, Institue for Adavanced Algorithm Research, Shanghai
Optimal ControlReinforcement LearningMachine Learning
L
Lei Wu
School of Mathematical Sciences and Center for Machine Learning Research, Peking University