Convex Formulations for Training Two-Layer ReLU Neural Networks

πŸ“… 2024-10-29
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF

career value

229K/year
πŸ€– AI Summary
Training two-layer ReLU neural networks is inherently non-convex, posing significant theoretical and computational challenges. Method: This work establishes, for the first time in the infinite-width limit, an exact equivalence between ReLU network training and a finite-dimensional convex completely positive program (CPP). We propose a compact semidefinite programming (SDP) relaxation that is solvable in polynomial time and preserves the optimal value of the original CPP problem exactly. Contribution/Results: Theoretically, we derive a precise correspondence between non-convex neural network training and convex optimization. Empirically, our SDP-based approach achieves competitive test accuracy on multi-class classification benchmarks, empirically validating both the tightness of the relaxation and its generalization capability. This work provides a novel convex analytical framework and a tractable computational pathway for deep learning training, bridging classical convex optimization theory with modern neural network practice.

Technology Category

Application Category

πŸ“ Abstract
Solving non-convex, NP-hard optimization problems is crucial for training machine learning models, including neural networks. However, non-convexity often leads to black-box machine learning models with unclear inner workings. While convex formulations have been used for verifying neural network robustness, their application to training neural networks remains less explored. In response to this challenge, we reformulate the problem of training infinite-width two-layer ReLU networks as a convex completely positive program in a finite-dimensional (lifted) space. Despite the convexity, solving this problem remains NP-hard due to the complete positivity constraint. To overcome this challenge, we introduce a semidefinite relaxation that can be solved in polynomial time. We then experimentally evaluate the tightness of this relaxation, demonstrating its competitive performance in test accuracy across a range of classification tasks.
Problem

Research questions and friction points this paper is trying to address.

Reformulates training infinite-width two-layer ReLU networks as convex optimization.
Introduces semidefinite relaxation for polynomial-time solvable convex formulations.
Evaluates relaxation tightness and performance in classification tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Convex reformulation of ReLU network training
Semidefinite relaxation for polynomial-time solution
Competitive test accuracy in classification tasks
πŸ”Ž Similar Papers
No similar papers found.