A Convex Relaxation Approach to Generalization Analysis for Parallel Positively Homogeneous Networks

📅 2024-11-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of deriving generalization bounds for parallel positively homogeneous neural networks—including deep linear/ReLU networks, single-layer multi-head attention, and matrix/tensor decompositions. We propose the first unified convex relaxation framework, embedding the non-convex empirical risk minimization problem into a convex space of prediction functions. By introducing a controllable bias term, our approach enables generalization analysis without relying on conventional parameter-norm or complexity-based regularizers. Leveraging theory of positively homogeneous functions and empirical process techniques, we derive structured risk bounds that yield near-linear sample complexity in network width across multiple model classes—substantially improving upon existing results. Our framework establishes a novel theoretical pathway for analyzing generalization in non-convex neural networks, offering both conceptual unification and quantitative advances.

Technology Category

Application Category

📝 Abstract
We propose a general framework for deriving generalization bounds for parallel positively homogeneous neural networks--a class of neural networks whose input-output map decomposes as the sum of positively homogeneous maps. Examples of such networks include matrix factorization and sensing, single-layer multi-head attention mechanisms, tensor factorization, deep linear and ReLU networks, and more. Our general framework is based on linking the non-convex empirical risk minimization (ERM) problem to a closely related convex optimization problem over prediction functions, which provides a global, achievable lower-bound to the ERM problem. We exploit this convex lower-bound to perform generalization analysis in the convex space while controlling the discrepancy between the convex model and its non-convex counterpart. We apply our general framework to a wide variety of models ranging from low-rank matrix sensing, to structured matrix sensing, two-layer linear networks, two-layer ReLU networks, and single-layer multi-head attention mechanisms, achieving generalization bounds with a sample complexity that scales almost linearly with the network width.
Problem

Research questions and friction points this paper is trying to address.

Derive generalization bounds for parallel positively homogeneous networks.
Link non-convex ERM to convex optimization for global lower-bound.
Apply framework to various models for linear sample complexity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Convex relaxation links non-convex ERM to convex optimization.
Generalization bounds derived via convex lower-bound analysis.
Sample complexity scales linearly with network width.
🔎 Similar Papers
No similar papers found.